Nvidia is unveiling its next-generation Ampere GPU architecture today. The first GPU to use Ampere will be Nvidia’s new A100, built for scientific computing, cloud graphics, and data analytics. While there have been plenty of rumors around Nvidia’s Ampere plans for GeForce “RTX 3080” cards, the A100 will primarily be used in data centers.
Nvidia’s latest data center push comes amid a pandemic and a huge increase in demand for cloud computing. Describing the coronavirus situation as “terribly tragic,” Nvidia CEO Jensen Huang noted that “cloud usage of services are going to see a surge,” in a press briefing attended by The Verge. “Those dynamics are really quite good for our data center business … My expectation is that Ampere is going to do remarkably well. It’s our best data center GPU ever made and it capitalizes on nearly a decade of our data center experience.”
The A100 sports more than 54 billion transistors, making it the world’s largest 7nm processor. “That is basically at nearly the theoretical limits of what’s possible in semiconductor manufacturing today,” explains Huang. “The largest die the world’s ever made, and the largest number of transistors in a compute engine the world’s ever made.”
[embedded content]
Nvidia is boosting its Tensor cores to make them easier to use for developers, and the A100 will also include 19.5 teraflops of FP32 performance, 6,912 CUDA cores, 40GB of memory, and 1.6TB/s of memory bandwidth. All of this performance isn’t going into powering the latest version of Assassin’s Creed, though.
Instead, Nvidia is combining these GPUs into a stacked AI system that will power its supercomputers in data centers around the world. Much like how Nvidia used its previous Volta architecture to create the Tesla V100 and DGX systems, a new DGX A100 AI system combines eight of these A100 GPUs into a single giant GPU.
The DGX A100 system promises 5 petaflops of performance, thanks to these eight A100s, and they’re being combined using Nvidia’s third-generation version of NVLink. Combining these eight GPUs means there’s 320GB of GPU memory with 12.4TB/s of memory bandwidth. Nvidia is also including 15TB of Gen4 NVMe internal storage to power AI training tasks. Researchers and scientists using the DGX A100 systems will even be able to split workloads into up to 56 instances, spreading smaller tasks across the powerful GPUs.
Nvidia’s recent $6.9 billion acquisition of Mellanox, a server networking supplier, is also coming into play, as the DGX A100 includes nine 200Gb/s network interfaces for a total of 3.6Tb/s per second of bidirectional bandwidth. As modern data centers adapt to increasingly diverse workloads, Mellanox’s technology will prove ever more important for Nvidia. Huang describes Mellanox as the all-important “connecting tissue” in the next generation of data centers.
“If you take a look at the way modern data centers are architected, the workloads they have to do are more diverse than ever,” explains Huang. “Our approach going forward is not to just focus on the server itself but to think about the entire data center as a computing unit. Going forward I believe the world is going to think about data centers as a computing unit and we’re going to be thinking about data center-scale computing. No longer just personal computers or servers, but we’re going to be operating on the data center scale.”
Nvidia’s DGX A100 systems have already begun shipping, with some of the first applications including research into COVID-19 conducted at the US Argonne National Laboratory.
“We’re using America’s most powerful supercomputers in the fight against COVID-19, running AI models and simulations on the latest technology available, like the NVIDIA DGX A100,” says Rick Stevens, associate laboratory director for Computing, Environment and Life Sciences at Argonne. “The compute power of the new DGX A100 systems coming to Argonne will help researchers explore treatments and vaccines and study the spread of the virus, enabling scientists to do years’ worth of AI-accelerated work in months or days.”
Nvidia says that Microsoft, Amazon, Google, Dell, Alibaba, and many other big cloud service providers are also planning to incorporate the single A100 GPUs into their own offerings. “The adoption and the enthusiasm for Ampere from all of the hyperscalers and computer makers around the world is really unprecedented,” says Huang. “This is the fastest launch of a new data center architecture we’ve ever had, and it’s understandable.”
Much like the larger DGX A100 cluster system, Nvidia also allows each individual A100 GPU to be partitioned into up to seven independent instances for smaller compute tasks. These systems won’t come cheap, though. Nvidia’s DGX A100 comes with big performance promises, but systems start at $199,000 for a combination of eight of these A100 chips.
It’s not clear how Nvidia will now progress Ampere directly into consumer-grade GPUs just yet. Nvidia introduced its Volta architecture, with dedicated artificial intelligence processors (tensor cores) in much the same way as today’s Ampere unveiling. But Volta didn’t go on to power Nvidia’s line of GeForce consumer products. Instead, Nvidia launched a Volta-powered $2,999 Titan V (which it called “the most powerful PC GPU ever created”) focused on AI and scientific simulation processing, not gaming or creative tasks.
Despite rumors of Volta powering future GeForce cards, Nvidia instead introduced its Turing architecture in 2018, which combined its dedicated tensor cores with new ray-tracing capabilities. Turing went on to power cards like the RTX 2080 instead of Volta, just weeks after Huang said the next line of graphics cards wouldn’t be launching for “a long time.” Nvidia even stripped out the RT and Tensor cores for Turing-powered cards like the GTX 1660 Ti.
New “RTX 3080” cards could be just months away then, but we still don’t know for sure if they’ll be using this new Ampere architecture. “There’s great overlap in the architecture, that’s without a doubt,” hinted Huang. “The configuration, the sizing of the different elements of the chip is very different.”
Nvidia uses HBM memory for its data center GPUs, and that’s not something the company uses for consumer PC gaming GPUs. The data center GPUs are also focused much more heavily on AI tasks and compute, than graphics. “We’ll be much more heavily biased towards graphics and less towards double-precision floating point,” adds Huang.
Speculation around Nvidia’s Ampere plans has intensified recently, and with the PlayStation 5 and Xbox Series X set to launch with AMD-powered GPU solutions later this year, Nvidia will surely need to have something new to offer PC gamers later this year.