- I am confused with the terms SIMD, SIMT and SPMD with regards to CUDA.
Which of these is CUDA?
All threads of a warp execute the same instruction…sugges that it is SIMT.
All threads of a warp execute the same instruction on different data…sugges that it is SIMD and SIMT.
All threads in a grid execute the same kernel function(program)…sugest that it is SPMD .
or have I developed a wrong understanding of SIMT.
- I am not able to find anything regarding SIMT. Can you please suggest me some reading on this!
Thanks and Regards
SIMD and SIMT are really two perspectives on the same concept. In both cases your hardware is designed to efficiently run an operation on many data elements simultaneously.
Traditional SIMD architectures require you to pack your data into vector registers and run instructions on entire vector registers at a time. (For example, 4 floats per register with SSE.) This is pretty straightforward for the easy case, but gets more complicated when you need to apply an instruction to only some elements of a vector register or split and take different code paths for different elements. You have to manage the vector nature of the processor at a high level and deal with all the special cases.
The SIMT approach of CUDA is mostly a different software view on the same hardware. You program a kernel to describe the operation of a single “thread” which you could generally map onto a single element of a vector register in a SIMD architecture. You focus on the operation on one data element (so to speak) and the runtime extends that out to many data elements based on the block and grid configuration. Branches and edge cases are handled for you automatically, although there is still an efficiency penalty for branched code, just like in SIMD.
At the hardware level, SIMT is pretty similar to SIMD. The GTX 285 is a 30 core chip able to complete a 32-wide SIMD instruction every 4 clock cycles. (The GTX 480 is a 15 core chip able to complete two 32-wide SIMD instructions every 2 clock cycles.)
You could argue that the CUDA grouping of threads into blocks shows aspects of SPMD, since blocks have very limited communication and synchronization options. Every block could be treated as a separate “process” I suppose.
So the answer to your question is: CUDA has aspects of all three, but SIMT (which I have never seen used except for CUDA) is probably the best description. SIMT is explained briefly in the intro chapter of the CUDA Programing Guide.
Thanks Seibert ! That Helped …