What is the difference between SP and CUDA core?

johnhuang · November 3, 2020, 9:58am

Hi all,

As we know, GTX1070 contains 1920 cuda cores and 15 streaming multiprocessors. Each SM has 128 cuda cores. However, according to the ‘CUDA_C_Programming_Guide’ by NVIDIA, the maximum number of resident threads per multiprocessor should be 2048.

Does it mean that one cuda core contains 16 resident threads, so cuda core is like 16 SPs combined?

If so, is the communication between the threads of different cuda cores, different from that between the threads of same cuda cores?

best regards

Robert_Crovella · November 3, 2020, 1:40pm

The most commonly used meaning of “core” is identical to the most commonly used meaning of SP (streaming processor) - they both refer to the functional units that support the single precision floating point add, multiply, and multiply-add instructions.

Its not correct to associate a thread of execution with a particular CUDA core. That’s not how the GPU works. A GPU SM includes a collection of functional units that each support different types of instructions. For example the LD/ST unit (load-store unit) supports LD and ST instructions. If a particular thread of execution has an LD instruction in it, that LD instruction will be issued to a LD/ST unit, not a CUDA core, and not a SP given the above commonly used definitions. Therefore threads are not uniquely associated with cores or SPs. In this sense the usage of the word “core” in typical GPU terminology is quite different from the typical usage in CPU terminology. Therefore understanding GPU thread-level execution requires that you divorce any notion of a thread of execution being associated with a particular core.

johnhuang · November 10, 2020, 9:57am

Thank you so much! @Robert_Crovella

So the number of CUDA cores as well as other functional units only determines the maximum number of the ACTIVE warps; It is the number of registers and shared memory (maybe and other resources) that determine the actual resident warps for one multiprocessor, right?

And if so, What is that determines the maximum number (64 for GTX1070) of RESIDENT warps per multiprocessor?

best regards

Robert_Crovella · November 10, 2020, 1:26pm

The maximum number of resident warps is a hardware limit. That’s why it is presented that way in the table. If you’re asking for some unpublished detail of the design of the SM that gives rise to that limit, I don’t have that info to share. The number of resident warps for a particular code will be a function of that code design against various hardware limits (such as registers per thread vs. maximum number of registers per SM). If none of the other limiting factors come into play, then the code should be able to achieve the maximum stated limit. Active is a new term you’ve brought into the discussion just now, so we’d have to carefully define that first.

johnhuang · November 17, 2020, 7:06am

Thank you! @Robert_Crovella

Yes I was asking for the details of the design.

By ‘active warps’ I was meaning the warps that are executing. Because a multiprocessor only has 4 warp schedulers, there are up to 4 warps are executing in any clock cycle.

best regards

Robert_Crovella · November 17, 2020, 2:42pm

I won’t be able to share non-public details of GPU design.

all execution units are pipelined, which means in any clock cycle many warps (more than 4) may be in various stages of executing. I think you are talking about “issued” warps. Even with “issued” warps, some GPUs have dual-issue warp schedulers, so some GPUs (Kepler comes to mind) can issue more than 4 warps in a clock cycle.

johnhuang · November 19, 2020, 2:33am

Now I understand. Thank you for your help! @Robert_Crovella

Topic		Replies	Views
help me understand cuda CUDA Programming and Performance	4	6876	February 10, 2010
Warps - Number of threads running concurrently CUDA Programming and Performance	4	2167	March 19, 2011
Relationship between Threads and GPU core/units CUDA Programming and Performance	5	6418	November 21, 2015
How many thread are executed at the same time ? CUDA Programming and Performance	9	7825	January 21, 2024
Newbie confusion: thread, block, multiprocessor and processor CUDA Programming and Performance	2	1056	April 13, 2011
Cuda Cores Cuda Cores - run threads bloocks, kernels etc. CUDA Programming and Performance	5	1739	February 22, 2011
How more exactly a thread is executed on GPU CUDA Programming and Performance	9	2976	March 7, 2017
Thread Scheduling Concept CUDA Programming and Performance	3	3692	June 21, 2012
how many threads concurrently run at a clock? CUDA Programming and Performance	3	1425	April 15, 2009
Maximum Number of Warps and Warp Size per SM CUDA Programming and Performance cuda , gpu , architecture-and-design	5	7025	November 30, 2022

What is the difference between SP and CUDA core?

Related topics