Meta Announces Next-Gen MTIA AI Accelerator, One Step Towards Hardware Independence

Meta has unveiled the latest version of its Meta Training and Inference Accelerator (MTIA), marking a significant development in its proprietary AI infrastructure. This updated MTIA is designed to support an extensive array of AI-driven applications, including generative AI products, recommendation systems, and advanced AI research. The need for this advancement comes as Meta anticipates a rise in computational demands due to increasing AI model sophistication.

Key improvements in the new MTIA include a doubling of both compute and memory bandwidth capacities compared to its first-generation predecessor. This enhancement enables the accelerator to handle more complex and memory-intensive AI operations more efficiently. The next-generation MTIA’s enhanced specifications include:

Compute Performance: Dense compute performance has increased 3.5 times, and sparse compute performance has seen a 7 times improvement.
Memory Upgrades: Local memory per processing element (PE) has increased from 128 KB to 384 KB, with on-chip memory growing from 128 MB to 256 MB, and off-chip LPDDR5 memory doubling from 64 GB to 128 GB.
Bandwidth Improvements: The bandwidth for local memory per PE has surged from 400 GB/s to 1 TB/s, with on-chip memory bandwidth increasing from 800 GB/s to 2.7 TB/s.

These advancements are part of Meta’s strategic shift towards reducing its dependency on third-party hardware providers like NVIDIA, aiming to cultivate a more self-reliant and optimized hardware ecosystem. This move aligns with Meta’s broader goal to control and integrate its hardware and software stack more tightly, facilitating the creation of a scalable and efficient infrastructure capable of supporting its vast and diverse AI applications.

Specification	First Gen MTIA	Next Gen MTIA
Technology	TSMC 7nm	TSMC 5nm
Frequency	800MHz	1.35GHz
Instances	1.12B gates, 65M flops	2.35B gates, 103M flops
Area	19.34mm x 19.1mm, 373mm²	25.6mm x 16.4mm, 421mm²
Package	43mm x 43mm	50mm x 40mm
Voltage	0.67V logic, 0.75V memory	0.85V
TDP	25W	90W
Host Connection	8x PCIe Gen4 (16 GB/s)	8x PCIe Gen5 (32 GB/s)
GEMM TOPS	102.4 TFLOPS/s (INT8) 51.2 TFLOPS/s (FP16/BF16)	708 TFLOPS/s (INT8) (sparsity) 354 TFLOPS/s (INT8) 354 TFLOPS/s (FP16/BF16) (sparsity) 177 TFLOPS/s (FP16/BF16)
SIMD TOPS	Vector core: 3.2 TFLOPS/s (INT8), 1.6 TFLOPS/s (FP16/BF16), 0.8 TFLOPS/s (FP32) SIMD: 3.2 TFLOPS/s (INT8/FP16/BF16), 1.6 TFLOPS/s (FP32)	Vector core: 11.06 TFLOPS/s (INT8), 5.53 TFLOPS/s (FP16/BF16), 2.76 TFLOPS/s (FP32) SIMD: 5.53 TFLOPS/s (INT8/FP16/BF16), 2.76 TFLOPS/s (FP32)
Memory Capacity	Local memory: 128 KB per PE On-chip memory: 128 MB Off-chip LPDDR5: 64 GB	Local memory: 384 KB per PE On-chip memory: 256 MB Off-chip LPDDR5: 128 GB
Memory Bandwidth	Local memory: 400 GB/s per PE On-chip memory: 800 GB/s Off-chip LPDDR5: 176 GB/s	Local memory: 1 TB/s per PE On-chip memory: 2.7 TB/s Off-chip LPDDR5: 204.8 GB/s

In conjunction with hardware upgrades, Meta has also refined the software stack associated with MTIA to enhance integration and operational efficiency. This software is designed to work seamlessly with PyTorch 2.0, incorporating features that boost developer productivity and computational performance. This comprehensive development of both hardware and software underlines Meta’s commitment to advancing its internal AI capabilities, ultimately aiming to improve the performance and efficiency of its AI deployments significantly.

Meta Announces Next-Gen Mtia Ai Accelerator, One Step Towards Hardware Independence

Source :

Meta Announces Next-Gen MTIA AI Accelerator, One Step Towards Hardware Independence

Related Posts