Nvidia announced today that its NVIDIA A100, the first of its GPUs based on its Ampere architecture, is now in full production and has begun shipping to customers globally. Ampere is a big generational jump in Nvidia’s GPU architecture design, providing what the company says is the “largest leap in performance to date” across all eight generations of its graphics hardware.
Specifically, the A100 can improve performance on AI training and inference as much as 20x relative to prior Nvidia data center GPUs, and it offers advantages across just about any kind of GPU-intensive data center workloads, including data analytics, protein modeling and other scientific computing uses and cloud-based graphics rendering.
The A100 GPU can also be scaled either up or down depending on the needs, meaning that you can use a single unit to handle as many as seven separate tasks with partitioning, and you can combine them to work together as one large, virtual GPU to tackle the toughest training tasks for AI applications. The “Multi-instance GPU” partitioning feature in particular is novel to this generation, and really helps emphasize the ability of the A100 to provide the most value for cost for clients of all sizes, since one could theoretically replace up to seven discrete GPUs in a data center if you’re already finding you have some headroom on your usage needs.
Alongside the production and shipping announcement, Nvidia is also announcing that a number of customers are already adopting the A100 for use in their supercomputers and data centers, including Microsoft Azure, Amazon Web Services, Google Cloud and just about every significant cloud provider that exists.
Nvidia also announced the DGX A100 system, which combines eight of the A100 GPUs linked together using Nvidia’s NVLink. That’s also available immediately directly from Nvidia, and from its approved resale partners.