Nvidia DGX


Nvidia DGX is a line of Nvidia produced servers and workstations which specialize in using GPGPU to accelerate deep learning applications.

DGX-1

DGX-1 servers feature 8 GPUs based on the Pascal or Volta daughter cards with HBM 2 memory, connected by an NVLink mesh network.
The product line is intended to bridge the gap between GPUs and AI accelerators in that the device has specific features specializing it for deep learning workloads.
The initial Pascal based DGX-1 delivered 170 teraflops of half precision processing, while the Volta-based upgrade increased this to 960 teraflops.

DGX-2

The successor of the Nvidia DGX-1 is the Nvidia DGX-2, which uses 16 32GB V100 cards in a single unit. This increases performance of up to 2 Petaflops with 512GB of shared memory for tackling larger problems and uses NVSwitch to speed up internal communication.
Additionally, there is a higher performance version of the DGX-2, the DGX-2H with a notable difference being the replacement of the Dual Intel Xeon Platinum 8168's @ 2.7 GHz with Dual Intel Xeon Platinum 8174's @ 3.1 GHz

DGX A100

Announced and released on May 14, 2020 was the 3rd generation of DGX server, including 8 Ampere-based A100 accelerators. Also included is 15TB of PCIe gen 4 NVMe storage, two 64-core AMD Rome 7742 CPUs, 1 TB of RAM, and Mellanox-powered HDR InfiniBand interconnect. The initial price for the DGX A100 was $199,000.

Accelerators

Comparison of accelerators used in DGX:
! Architecture !! FP32 CUDA Cores !! Boost Clock !! Memory Clock !! Memory Bus Width !! Memory Bandwidth !! VRAM !! Single Precision !! Double Precision !! INT8 Tensor !! FP16 Tensor !! TF32 Tensor !! Interconnect !! GPU !! GPU Die Size !! Transistor Count !! TDP !! Manufacturing Process