AWS introduces new Trn1 chips to speed up training of machine learning models

As more companies move to custom silicon for their customer’s workloads, Amazon has been busy on this front. They introduced the Inferentia chip in 2019 to help speed up inference learning. Then last year the company launched a second Trainium chip, designed specifically for machine learning models. Today, AWS continued to build on this previous work, introducing its latest machine learning chip, the Trn1.

Adam Selipsky, delivering his first AWS re:Invent keynote, dispatched the news about the latest chip on stage in Las Vegas this morning.

“So today, I’m excited to announce the new Trn1 instance powered by Trainium, which we expect to deliver the best price-performance for training deep learning models in the cloud and the fastest on EC2,” Selipsky told the re:Invent audience.

“Trn1 is the first EC2 instance with up to 800 gigabytes per second bandwidth. So it’s absolutely great for large scale, multinode distributed training use cases.” He said that this should work well for use cases like image recognition, natural language processing, fraud detection and forecasting.

What’s more, you can network these chips together for even more powerful performance when they put them into “ultra clusters.”

“We can network these together and what we call Ultra clusters consisting of tens of thousands of training accelerators interconnected with petabyte scale networking. These training Ultra clusters are powered by a powerful machine learning supercomputer for rapidly training the most complex, deep learning models with trillions of parameters,” Selipsky said.

The company also plans to work with partners like SAP to take advantage of this new processing power, Selipsky said.