Updating my records

AlexanderAgathos · February 7, 2011, 5:29pm

For the GTX 4xx and 5xx

How many threads per SM?
How many blocks per SM?

Thanks,
Alexander.

seibert · February 7, 2011, 5:51pm

See Appendix G of the CUDA Programming Guide. (1536 and 8)

AlexanderAgathos · February 7, 2011, 6:19pm

I am surprised so the GTX 480 and the GTX 580 can accommodate fewer threads than GT200. Interesting…so performance lies in warping and clock speed then with a kernel executed with a GT200 and GTX 480.

avidday · February 7, 2011, 6:59pm

No. Although the GF100 has less MP than the GT200, it has double the number of cores, and as a result much higher average IPC than the GT200. Clock speeds are about the same or even a little lower on GF100 than the GT200.

AlexanderAgathos · February 7, 2011, 7:18pm

Not just double 4 times. This is why I am surprised with the 8 Blocks and 1536 threads. I do not understand where your no is referring to can you be more specific? Ok I was wrong in the clock speed you are right instructions per clock is the most correct term. I do not understand though where your no goes. :-)

What I understand is your objection with the number of threads? Actually it shouldn’t have been a no but a yes with a correction to IPC. The number of threads that can execute in the GF100 is less than GT200 but the GF100 makes up for this loss with the increased number of SPs or as the magazines call the CUDA cores to be more fancy which increase the IPC. So in average for the same number of threads GF100 is faster. I think that this concludes it unless you object.

avidday · February 7, 2011, 7:41pm

GT200: 240 cores. GF100: 448/480 cores. GF110: 480/512 cores. ie. double the number of cores.

The IPC increase comes because a single GF100/GF110 MP can dual issue a pair of half-warps at each cycle, whereas the GT200 can only issue a single warp per 4 clock cycles.

AlexanderAgathos · February 7, 2011, 8:04pm

No I object. Everything is per SM a block is executed in SM so this is where you should look. The GF200 has 30SMs with 8 Cuda Cores each. So threads in a SM gets resources by just 8 SPs whereas in GF 100 can get resources by 32 SPs. This advantage of warping is because of the increased SP number.

AlexanderAgathos · February 7, 2011, 8:05pm

No I object. Everything is per SM a block is executed in SM so this is where you should look. The GF200 has 30SMs with 8 Cuda Cores each. So threads in a SM gets resources by just 8 SPs whereas in GF 100 can get resources by 32 SPs. This advantage of warping is because of the increased SP number.

Topic		Replies	Views
Cuda Cores Cuda Cores - run threads bloocks, kernels etc. CUDA Programming and Performance	5	1727	February 22, 2011
Basic Cuda Confusion - help CUDA Programming and Performance	9	1894	February 11, 2013
Multiprocessors or Cuda Cores CUDA Programming and Performance	25	19416	July 5, 2011
GTX 480 vs GTX 285, less MP more cores CUDA Programming and Performance	11	31223	July 16, 2010
What will be happen in the situation CUDA Programming and Performance	9	6240	December 23, 2008
Number of Threads vs Number of Blocks in GPU Kernel CUDA Programming and Performance	4	8395	July 16, 2017
Sharef memory on Fermi CUDA Programming and Performance	4	1675	October 1, 2009
GTX580 vs GTX680 SP performance CUDA Programming and Performance	1	6685	June 3, 2012
Performance in different thread-block schemes CUDA Programming and Performance	5	2343	September 19, 2008
Why GK110 has 192 cores but 4 warps? CUDA Programming and Performance	8	5276	June 6, 2012

Updating my records

Related topics