Warp Number != Core Number

Hi there:

./deviceQuery tells me that TX2 accommodates a maximum number of 2048 threads per multiprocessor, and the warp size is 32. This means that a multiprocessor can handle 2048 / 32 = 64 warps.

If I am not very mistaken, a warp itself can only be scheduled on a single core. That is, 64 warps map to 64 cores. but TX2 has 128 CUDA cores per multiprocessor. If that is the case, only half of the CUDA cores are needed (although I am sure this is a wrong conclusion)? Am I missing something here?


CUDA cores is a computing units rather than the “core” on a classical CPU.
The mapping unit for a GPU core is thread rather than a warp.

Usually, we map [N,2N] threads to N cores depends on the use case.
This indicates that around 256 threads can make all the TX2 cores active.