CUDA threads and warps

sachi · January 16, 2015, 10:35am

Hello all ,

I am new to GPU programming . i have some questions regarding basic concepts cuda and gpu hardware

If we assign threads , in runtime these threads are again divided into warps . I am quite confused about how gpu execute instructions . Is that the thread executes one instruction or a group of threads(warps ) execute one instruction ?? And how cuda cores involve with the process when executing an instructions.

And i am using Jetson TK1 , i have read that it has only one SM. so how many blocks does that SM have?

Thank you

Josh_Holloway · February 17, 2015, 7:26pm

Okay, so I’m new at this too, but I’d like to try to help as much as I can.

The Jetson TK1 has a single Kepler SM (192 CUDA cores).

When you launch your kernel, the GPU will map each block onto that SM. The scheduling is done automatically. I am not sure if all threads in the Block must complete execution before another Block can run on the SM. I would assume all threads in the block much complete before applying another block to the SM, because if another block was scheduled to execute on the SM then the unfinished threads in that first block could not be executing.

So inside each block that is executing on the SM, the block is split up into Warps (groups of 32 threads). Now I know that if one of the threads in that warp must wait on another thread from another warp to finish, then the warp will be “context switched” with another warp, and these warps will be moved back and forth which the GPU does all this scheduling for you.

Each thread is mapped to a single CUDA core which inside that thread the SINGLE INSTRUCTION that is performed is the operations defined inside the kernels body. This is the concept of SIMD. The same istruction (the kernel) is executed on MULTIPLE DATA (the input arguments to the kernel that is executed as that specific thread on one singe CUDA core).

Does that help at all?

sachi · May 8, 2015, 2:11pm

Thank you very much for the info.

every thread on a kernel is mapped to a specific core , and thread will use the resource of CUDA core.IF my kernel has 192 threads, 192 threads will be mapped to 192 cuda cores(single Kepler SM). In 3.x compute capability GPU , there can be maximum 1024 thread per block. So what happens to the excess threads .( 1024-192)?

Please let me know if my understanding are wrong.

Seeky · May 12, 2015, 5:24am

As long as the first 192 threads have some work to do, they will keep running. As soon as one of the 192 currently running threads needs to wait, for example on load store units(threre are just 32) another thread will be started, and the waiting one is simply suspended. The reason for the limit of concurrently running threads is that the thread context needs to be stored until the threads is completed. Since there is no infinite amount of space to store the thread data, it is limited. There is no guarantee that threads 0-191 will finish first. The order can be arbitrary, depending on how they can be scheduled the most efficient way.

Always keep in mind, new threads are only started if a whole warp is suspended. If one of the 32 threads is working, the other 31 slots for threads are still blocked by the ones which are grouped into a warp with the working one.

Topic		Replies	Views
CUDA threads and warps CUDA Setup and Installation	0	641	January 16, 2015
thread, warp, block, grid, device CUDA Programming and Performance	3	6815	November 25, 2016
Thread Scheduling Concept CUDA Programming and Performance	3	3772	June 21, 2012
Blocks/Warps/Threads Allocation I have some doubts about the allocation of blocks/warps/thread in CU CUDA Programming and Performance	5	2606	November 1, 2012
CUDA hardware level: Streaming Multiprocessor CUDA Programming and Performance	1	2651	April 27, 2015
CUDA execution mapping onto GPUs CUDA Programming and Performance	0	2829	March 2, 2009
Cuda Cores Cuda Cores - run threads bloocks, kernels etc. CUDA Programming and Performance	5	1801	February 22, 2011
help me understand cuda CUDA Programming and Performance	4	6897	February 10, 2010
Warp thread Scheduling CUDA Programming and Performance	7	2261	June 28, 2010
Warp scheduling - have I got this right? CUDA Programming and Performance	17	12264	February 12, 2013

CUDA threads and warps

Related topics