This is my idea about GPU multi-DIE design

pofice · November 20, 2022, 12:29pm

GPU core architecture diagram: core code Block 100, ALU cluster based on unified computing architecture, 128 FP32 units per SM unit, 8 tensor cores. There are 10 SM units in each group of GPC units, and there are two GPC units in each GPU DIE. There are 2560 FP32 CUDA COREs, which support multi-instance GPU (MIG) technology. A GPU DIE can be divided into up to 20 independent GPU instances, a GPC unit or TPC unit or SM unit can be divided into one GPU instance, so a GPU DIE can be divided into 2 or 10 or 20 GPU instances, totally three modes. GPU instances of each mode can run simultaneously, and each mode can match each other. GPU instances of different modes can run simultaneously. Each instance has its own memory, cache, and streaming multiprocessors. In multiple GPU instances, GPU cloud acceleration can be provided to a large number of clients, such as the account of each mobile phone, and different resources can be allocated according to different accounts. And some of the less configured clients can use GPU cloud acceleration to make the game less stunning, and can be applied to school servers to provide GPU computing power for each client, and more… and doubling the number of ZRLinks at the core of this GPU also means doubling the bandwidth of the connection. L1.5 Cache is also introduced between SM and L2 Cache to buffer
The four memory controllers (Memory controllers) on the GPU are applied to HBM3 high-bandwidth memory, which is close to the display memory and easy to wiring to save production costs. The memory controller on the left and right ends is applied to GDDR6X memory. For in-depth learning training, the GPU prefers HBM3 high-bandwidth memory, because generally, the data model for in-depth learning training is not very large, plus support for mixed precision (FP16+FP8), so the demand for video memory capacity is usually not very tense during training.
For AI reasoning and training some large models, the size of the display memory is particularly important, because it directly determines whether the GPU can be trained and reasoned. This requires a memory controller (GDDR6X) on both the left and right ends. With the iteration of deep learning algorithms, the GPU now requires more and more memory. For example, the recently leaked Diffusion Model-based AI painting, which relies on single-precision floating-point performance, cannot even reasonably calculate 720P image size with 24GB of memory. This has caused me to think deeply, and I think we can support HBM3 and GDDR6X memory in one core at the same time. Because GDDR6X is cheaper than HBM3 and has much higher bandwidth than CPU’s shared memory, it is a cost-effective option and we can call GDDR6X when HBM3 is not enough. They are encapsulated in the same SXM module, and the power supply in the SXM module is transferred to the motherboard to improve the calculation density.

Topic		Replies	Views
Kepler and Maxwell, oh my! CUDA Programming and Performance	55	55782	October 19, 2010
Low P2P GPU bandwidth performance between GeForce GPUs CUDA Programming and Performance	20	1056	October 9, 2024
four 9800GX2 cards: will it work? CUDA Programming and Performance	33	23351	May 28, 2008
The fastest platform of GPU computing CUDA Programming and Performance	38	40310	January 19, 2010
embed system the relation ship between arm cores and gup should be more different than pc system. Jetson TK1	0	528	June 8, 2015
Shopping-list for Cuda GPGPU System in 800-1000 euro price-range Goal: A 'budget' GTX 470 (F CUDA Programming and Performance	59	12035	April 15, 2010
Using more than 1 CUDA card at a time. Physics simulations flat out flying on GPU CUDA Programming and Performance	12	12558	March 12, 2010
How NVLink Will Enable Faster, Easier Multi-GPU Computing Technical Blog	10	770	June 15, 2016
CUDA hardware & software CUDA Programming and Performance	9	2673	November 13, 2010
Inside Pascal: NVIDIA's Newest Computing Platform Technical Blog	51	780	December 8, 2017

This is my idea about GPU multi-DIE design

Related topics