Here’s what I know about SIMD and SIMT:
(Though, from the point of view of a C/C++ programmer whose aim is to just map an algorithm into a data-parallel version to run it on CUDA, both these essentially mean the same) :D
In SIMD, you need to specify the data array + an instruction (on which to operate the data on) + THE INSTRUCTION WIDTH.
Eg: You might want to add 2 integer arrays of length 16, then a SIMD instruction would look like (the instruction has been cooked-up by me for demo)
add.16 arr1 arr2
However, SIMT doesn’t bother about the instruction width. So, essentially, you could write the above example as:
arr1[i] + arr2[i]
and then launch as many threads as the length of the array, as you want.
Note that, if the array size was, let us say, 32, then SIMD EXPECTS you to explicitly call two such ‘add.16’ instructions!
Whereas, this is not the case with SIMT.