SP, DP, tensor "core" in V100

llodds · December 20, 2019, 11:57am

The diagram of V100 shows that each SM unit has 64 SP core, 32 DP core, and 8 Tensor core. I am wondering whether these SP and DP core are using the same hardware executaion units or not, i.e., are 2 SP cores logically identified as 1 DP core, or in fact V100 has seperate hardware units for SP and DP instructions.

mnicely · December 21, 2019, 2:03pm

According to https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf

Similar to Pascal GP100, the GV100 SM incorporates 64 FP32 cores and 32 FP64 cores per SM.
However, the GV100 SM uses a new partitioning method to improve SM utilization and overall
performance. Note that the GP100 SM is partitioned into two processing blocks, each with 32
FP32 Cores, 16 FP64 Cores, an instruction buffer, one warp scheduler, two dispatch units, and a
128 KB Register File. The GV100 SM is partitioned into four processing blocks, each with 16 FP32
Cores, 8 FP64 Cores, 16 INT32 Cores, two of the new mixed-precision Tensor Cores for deep
learning matrix arithmetic, a new L0 instruction cache, one warp scheduler, one dispatch unit,
and a 64 KB Register File. Note that the new L0 instruction cache is now used in each partition to
provide higher efficiency than the instruction buffers used in prior NVIDIA GPUs. (See the Volta
SM in Figure 5).

I take that as seperate units.

llodds · December 25, 2019, 11:38pm

Thanks so much for the information. Suppose my program uses all SP cores for every SM, then is it possible to use DP core at the same time? I am wondering If I have to program in mixed-precision in order to fully exploit the performance of V100.

Topic		Replies	Views
Tensor core, is my analysis correct? CUDA Programming and Performance	2	71	February 5, 2025
Nvidia announces Tesla V100 (Volta) CUDA Programming and Performance	19	5235	November 30, 2017
SM Unit Structure for Jetson Xavier Family Devices Jetson Xavier NX kernel	10	2585	October 18, 2021
Separate CUDA Core pipeline for FP16 and FP32? Nsight Compute	11	515	August 20, 2024
Inside Volta: The World’s Most Advanced Data Center GPU Technical Blog	43	1069	October 1, 2018
Question regarding Tensor Cores/GV100 CUDA Programming and Performance	8	2547	August 12, 2017
How to calculate the Tensor Core FP16 performance of H100? CUDA Programming and Performance	9	6277	August 14, 2024
About the relationship between warp and tensor_core CUDA Programming and Performance	7	1387	July 7, 2023
Mma instructions on A100 CUDA Programming and Performance	5	148	October 1, 2024
Programming Tensor Cores in CUDA 9 Technical Blog	0	248	August 21, 2022

SP, DP, tensor "core" in V100

Related topics