AWS Unveils Trainium2 Chips for Large Language Models, Promises 4x Faster Performance

AWS has officially launched its Trainium2 (T2) chips, designed specifically for training and deploying large language models (LLMs). The new chips, first announced a year ago, boast a significant performance boost, with a single Trainium2-powered EC2 instance featuring 16 T2 chips capable of delivering up to 20.8 petaflops of compute performance.

This substantial increase in performance is expected to have a significant impact on the field of artificial intelligence. According to AWS, running inference for Meta's massive Llama 405B model as part of Amazon's Bedrock LLM platform will offer "3x higher token-generation throughput compared to other available offerings by major cloud providers."

The Trainium2 chips will also be deployed in AWS' EC2 Trn2 UltraServers, which feature 64 interconnected Trainium2 chips. These instances can scale up to 83.2 peak petaflops of compute, making them an attractive option for organizations working with large language models. An AWS spokesperson clarified that the 20.8 petaflops performance numbers are for dense models and FP8 precision, while the 83.2 petaflops value is for FP8 with sparse models.

AWS is working closely with Anthropic, a leading LLM provider, to build a massive cluster of these UltraServers featuring "hundreds of thousands of Trainium2 chips." This new cluster is expected to be 5x as powerful (in terms of exaflops of compute) compared to the cluster Anthropic used to train its current generation of models, and AWS notes that it will be the world's largest AI compute cluster reported to date.

The performance specs of the Trainium2 chips are an improvement over Nvidia's current generation of GPUs, which remain in high demand and short supply. However, Nvidia's next-gen Blackwell chips, promised to arrive early next year, will dwarf the Trainium2's performance with up to 720 petaflops of FP8 performance in a rack with 72 Blackwell GPUs.

In a move that showcases AWS' commitment to staying at the forefront of AI innovation, the company has already announced its next-generation Trainium3 chips. Built on a 3-nanometer process, the Trainium3 is expected to deliver another 4x performance gain for UltraServers and is slated for release in late 2025. This rapid release cycle is a testament to AWS' dedication to providing customers with the tools they need to build and deploy increasingly complex AI models.

"Trainium2 is the highest performing AWS chip created to date," said David Brown, vice president of Compute and Networking at AWS. "And with models approaching trillions of parameters, we knew customers would need a novel approach to train and run those massive models. The new Trn2 UltraServers offer the fastest training and inference performance on AWS for the world's largest models. And with our third-generation Trainium3 chips, we will enable customers to build bigger models faster and deliver superior real-time performance when deploying them."

The Trn2 instances are now generally available in AWS' US East (Ohio) region, with other regions set to launch soon. The UltraServers are currently in preview, with further details on availability expected in the coming months.

As the AI landscape continues to evolve, AWS' Trainium2 and Trainium3 chips are poised to play a significant role in shaping the future of large language models and artificial intelligence as a whole. With their impressive performance specs and rapid release cycle, AWS is sending a clear message to the industry: it's committed to staying at the forefront of AI innovation and providing customers with the tools they need to build and deploy the most advanced AI models.

AWS Unveils Trainium2 Chips for Large Language Models, Promises 4x Faster Performance

Similiar Posts

US Raises Alarm as South Sudan's Crisis Takes a Dark Turn

Glovo to Hire 15,000 Riders as Employees, Facing €100 Million Earnings Hit

Robinhood Launches Online Banking Platform with Doorstep Cash Delivery