Deep Learning Card Comparison – 2022
NVIDIA’s cards is still the best choice for deep learning, and it has variety of cards or modules for choosing. We collected some specification which might be useful when choosing the cards or modules.
Name | Form Factor | Core | Core Architecture | Chip Manufacturing | Die Size (mm^2) | Transistor# | SM | Tensor Cores | FP64 CUDA Cores | FP32 CUDA Cores | Memory | TDP(W) | FP64 FLOPS | FP32 FLOPS | TF32 FLOPS | FP16 FLOPS | BF16 FLOPS | INT16 FLOPS | INT8 FLOPS | INT4 FLOPS | Fmax (GHz) | Memory Bandwidth |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Nvidia H100 SXM | SXM5.0 Card | 1x GH100 | Hopper | TSMC 4N | 814 | 80B | 132 | 528 | 8448 | 16896 | 80GB/HBM3 | 700W | 30T 60T(Tensor Core) | 60T 1000|500T(Tensor Core) | 500T | 120T 1000T(Tensor Core) | 1000T(Tensor Core) | 3TB/s | ||||
Nvidia H100 PCIe | PCIe Card | 1x GH100 | Hopper | TSMC 4N | 814 | 80B | 114 | 456 | 7296 | 14592 | 80GB HBM2e | 350W | 24T 48T(Tensor Core) | 48T 400T(Tensor Core) | 400T(Tensor Core) | 800T(Tensor Core) | 800T(Tensor Core) | 2TB/s | ||||
Nvidia A100 80GB | SXM4.0 Card | 1x GA100 | Ampere | TSMC N7 | 826 | 54.2B | 108 | 432 | 3456 | 6912 | 80GB HBM2e | 400W | 9.7/TC: 19.5 | 19.5T | TC: 156/312* | TC: 312/624* | TC: 312/624* | TC: 624/1248* | TC: 1248/2496* | 1.41 | 2TB/s 5Kb 3.2Gbps | |
Nvidia A100 40GB | SXM4.0 Card | 1x GA100 | Ampere | TSMC N7 | 826 | 54.2B | 108 | 432 | 3456 | 6912 | HBM2 : 40GB | 400W | 9.7/TC: 19.5 | 19.5 | TC: 156/312* | TC: 312/624* | TC: 312/624* | TC: 624/1248* | TC: 1248/2496* | 1.41 | 1.6TB/s 5Kb 2.56Gbps | |
Nvidia A100-PCIe | PCIe Card FHFL Dual-Slot | 1x GA100-883AA-A1 | Ampere | TSMC N7 | 826 | 54.2B | 108 | 432 | 3456 | 6912 | 40/80GB HBM2 | 250W | ||||||||||
Nvidia A40 | PCIe Card FHFL Dual-Slot | 1x GA102 | Ampere | Samsung 8nm | 628.4 | 28.3B | 10752 | GDRR6 : 48GB | 300W | 335G | 38T | 75 | 151 | NA | 302 | NA | NA | 696GB/s 384b 14.5Gbps | ||||
Nvidia RTX A6000 | PCIe Card FHFL Dual-Slot | 1x GA102 | Ampere | Samsung 8nm | 628.4 | 28.3B | 84 | 336(v3) | 10752 | GDDR6 : 48GB | 300W | 335G 1,210G(1:32) | 38.7T | 75 | 38.7T(1:1) | NA | 302 | NA | 1.8 | 768GB/s 384b 16Gbps | ||
Nvidia RTX 3090Ti | PCIe Card FHFL Three-Slot | 1x GA102-350-A1 | Ampere | Samsung 8nm | 628.4 | 28.3B | 84 | 336(v3) | 10752 | GDDR6X : 24GB | 450W[2] | 625.0G(1:64) | 40T | 40T(1:1) | 1.86 | 768GB/s 384b 1,008 GB/s | ||||||
Nvidia RTX 3090 | PCIe Card FHFL Three-Slot | 1x GA102-300-A1 | Ampere | Samsung 8nm | 628.4 | 28.3B | 82 | 328(v3) | 10496 | GDDR6X : 24GB | 350W | 556.0G(1:64) | 35.58T | TC: 35.6/71* | 35.6T TC: 71/142* (w/ FP32+) TC: 142/284* (w/ FP16+) | 35.6T TC: 71/142* (w/ FP32+) | TC: 284/568* | TC: 568/1136* | 1.695 | 936GB/s 384b 19.5Gbps | ||
Nvidia RTX 3080Ti | PCIe Card FHFL Dual-Slot | 1x GA102-225-A1 | Ampere | Samsung 8nm | 628.4 | 28.3B | 80 | 320(v3) | 10240 | GDDR6X : 12GB | 350W | 532.8G | 34.1 | 34.1 | 1.67 | 912.4 GB/s 384-bit | ||||||
Nvidia RTX 3080 | PCIe Card FHFL Dual-Slot | 1x GA102-200-KD-A1 | Ampere | Samsung 8nm | 628.4 | 28.3B | 68 | 272(v3) | 8704 | GDDR6X : 10GB | 320W | 465.1G(1:64) | 29.77T | 29.77T | 1.71 | 760GB/s 320-bit | ||||||
Nvidia RTX 3070 Ti | PCIe Card FHFL Dual-Slot | 1x GA104-400-A1 | Ampere | Samsung 8nm | 392.5 | 17.4B | 48 | 192(v3) | 6144 | GDDR6X : 8GB | 290W | 339.8G(1:64) | 21.75T | 21.75T | 1.77 | 256-bit | ||||||
Nvidia RTX 3070 | PCIe Card FHFL Dual-Slot | 1x GA104 | Ampere | Samsung 8nm | 392.5 | 17.4B | 46 | 184(v3) | 5888 | GDDR6 : 8GB | 220W | 317.4G(1:64) | 20.31T | TC: 20.3/40.6* | 20.31 TC: 40.6/81.3* (w/FP32+) TC: 81.3/162.6* (w/FP16+) | 20.3 TC: 40.6/81.3* (w/FP32+) | TC: 162.6/325.2* | TC: 325.2/650.4* | 1.725 | 448GB/s 256-bit 14Gbps | ||
Nvidia RTX 3060TI | PCIe Card FHFL Dual-Slot | 1x GA104-200-A1 | Ampere | Samsung 8nm | 392.5 | 17.4B | 38 | 152(v3) | 4864 | GDDR6: 8 GB | 200W | 253.1G(1:64) | 16.20T | 16.20T | 1.67 | 448GB/s 256-bit | ||||||
Nvidia RTX 3060 | PCIe Card FHFL Dual-Slot | 1x GA106-300-A1 | Ampere | Samsung 8nm | 276 | 13.3B | 28 | 112(v3) | 3584 | GDDR6: 12GB | 170W | 199.0G(1:64) | 12.74T | 12.74T | 1.78 | 360.0 GB/s 192-bit | ||||||
Nvidia TITAN RTX | PCIe Card FHFL Dual-Slot | 1x TU102-400-A1 | Turing | 12 nm FFN | 754 | 18.6B | 72 | 576 | 4608 | GDDR6: 24GB | 280W | 509.8G(1:32) | 16.30T | 32.62T | 1.77 | 672 GB/s 384-bit 14 Gbps | ||||||
Nvidia RTX 2080 Ti | PCIe Card FHFL Dual-Slot | 1x TU102-300 | Turing | TSMC 12nm | 754 | 18.6B | 68 | 544 | 4352 | GDDR6: 11GB | 250W | 420G(1:32) | 13.45 | 26.90T | 1.75 | 616.0 GB/s 352-bit 14Gbps | ||||||
Nvidia RTX 2070 Super | PCIe Card FHFL Dual-Slot | 1x TU104 | Turing | TSMC 12nm | 545 | 13.6B | 40 | 320 | 2560 | GDDR6: 8 GB | 215W | 283.2G(1:32) | 9.062 | 18.12T | 1.77 | 448GB/s 256-bit 14Gbps | ||||||
Nvidia RTX 2070 | PCIe Card FHFL Dual-Slot | 1x TU106-400A-A1 | Turing | TSMC 12nm | 445 | 10.8B | 36 | 288 | 2304 | GDDR6: 8 GB | 175W | 233.3G(1:32) | 7.465T | 14.93T | 448.0 GB/s 256 bit | |||||||
Nvidia V100S-PCIe | PCIe Card FHFL Dual-Slot | 1x GV100 | Volta | TSMC12 FFN | 815 | 21.1B | 80 | 640(v1) | 2560 | 5120 | HBM2 : 32GB | 250W | 8.2 | 16.4 | / | TC: 130 | / | 62 | / | 1.64 | 1.134TB/s 4Kb 2.2Gbps | |
Nvidia V100 | SXM3.0 Card | 1x GV100 | Volta | TSMC12 FFN | 815 | 21.1B | 80 | 640(v1) | 2560 | 5120 | HBM2 : 32/16GB | 300W | 7.8 | 15.7 | / | TC: 125 | / | 62 | / | 1.53 | 900GB/s 4Kb 1.75Gbps | |
Nvidia V100-PCIe | PCIe Card FHFL Dual-Slot | 1x GV100 | Volta | TSMC12 FFN | 815 | 21.1B | 80 | 640(v1) | 2560 | 5120 | HBM2 : 32/16GB | 250W | 7 | 14 | / | TC: 112 | / | 62 | / | 1.4 | 900GB/s 4Kb 1.8Gbps |