I’m new to CUDA and confusing with the relationship between the warp size and the number of CUDA cores.
It’s known that one SM of a GT200 GPU has 8 CUDA cores, and there are 4 thread run concurrently, so the warp size is 32.
In a CTX460 GPU, a SM has 48 CUDA cores. Why it has the same size of warp with GT200?
Are there any idle CUDA cores in a SM?
Compute capability 2.1 devices are able to issue multiple (actually, 2) arithmetic instructions from the same warp in parallel.
Starting with compute capability 2.0 (e.g. GTX 480), instructions from two warps are issued in parallel (a half-warp each, so instructions for a full warp are issued over 2 cycles). In 2.1, one of the two warps executed in parallel is allowed to issue two arithmetic instructions, so that 48 cores can be saturated.
Issuing multiple instructions per cycle isn’t actually new. 1.x devices already could issue an fmad and an fmul in parallel
Thank you so much!