Updating my records

For the GTX 4xx and 5xx

How many threads per SM?
How many blocks per SM?

Thanks,
Alexander.

See Appendix G of the CUDA Programming Guide. (1536 and 8)

I am surprised so the GTX 480 and the GTX 580 can accommodate fewer threads than GT200. Interesting…so performance lies in warping and clock speed then with a kernel executed with a GT200 and GTX 480.

No. Although the GF100 has less MP than the GT200, it has double the number of cores, and as a result much higher average IPC than the GT200. Clock speeds are about the same or even a little lower on GF100 than the GT200.

Not just double 4 times. This is why I am surprised with the 8 Blocks and 1536 threads. I do not understand where your no is referring to can you be more specific? Ok I was wrong in the clock speed you are right instructions per clock is the most correct term. I do not understand though where your no goes. :-)

What I understand is your objection with the number of threads? Actually it shouldn’t have been a no but a yes with a correction to IPC. The number of threads that can execute in the GF100 is less than GT200 but the GF100 makes up for this loss with the increased number of SPs or as the magazines call the CUDA cores to be more fancy which increase the IPC. So in average for the same number of threads GF100 is faster. I think that this concludes it unless you object.

GT200: 240 cores. GF100: 448/480 cores. GF110: 480/512 cores. ie. double the number of cores.

The IPC increase comes because a single GF100/GF110 MP can dual issue a pair of half-warps at each cycle, whereas the GT200 can only issue a single warp per 4 clock cycles.

No I object. Everything is per SM a block is executed in SM so this is where you should look. The GF200 has 30SMs with 8 Cuda Cores each. So threads in a SM gets resources by just 8 SPs whereas in GF 100 can get resources by 32 SPs. This advantage of warping is because of the increased SP number.

No I object. Everything is per SM a block is executed in SM so this is where you should look. The GF200 has 30SMs with 8 Cuda Cores each. So threads in a SM gets resources by just 8 SPs whereas in GF 100 can get resources by 32 SPs. This advantage of warping is because of the increased SP number.