This is my idea about GPU multi-DIE design


This is the core architecture diagram of the new GPU (the core code is Diamond 100, which I named)
The top four memory controllers are used to control HBM2 (HBM2E) video memory, which is closer to the video memory for easy wiring
The two memory controllers on the upper left and upper right are used to control GDDR6, and you can add video memory.
PCIE 5.0 host interface, 4 ZRLink above are high-speed bus & high-speed hub, easy to exchange data inside and outside the core and GPU interconnection
Based on AMPERE SM, each SM unit has 32 FP64 units, 64 FP32 units, 64 INT32 units, and 4 tensor cores. Each group of GPC units has 10 SM units, each GPU DIE has two GPC units, a total of 1280 CUDA, supports multi-instance GPU (MIG) technology, you can divide a GPU DIE into up to 20 independent GPUs For example, a GPC unit or TPC unit or SM unit can be divided into a GPU instance, so a GPU DIE can be divided into 2 or 10 or 20 GPU instances, a total of three modes, each mode of GPU instance can Run at the same time, and each mode can be matched with each other. GPU instances of different modes can run simultaneously. Each instance has its own memory, cache, and streaming multiprocessor. In multiple GPU instances, a large number of The client provides GPU cloud acceleration, such as each mobile phone account, and different resources can be allocated according to different accounts, and in some lower-configuration clients, the GPU cloud acceleration can make the game screen less stagnant, and can The server used in the school provides GPU computing power for each client, and of course there are more… And the number of ZRLinks of this GPU core doubles, which also means that the Internet bandwidth doubles