C2040 /fermi limits

LHickey · January 18, 2011, 3:41pm

I have a c2050 on windows 7 64 bit. I look at maxThreadsPerBlock from cudaGetDeviceProperties I get 1024, which makes sense. 32 warps each with 32 threads. But maxThreadsDim[0],[1],[2] gives 1024,1024,64 for x y and z for the size of the thread block, This way exceeding 32 * 32, are there really thread blocks with 1024 by 1024 by 64 threads, and if they block does not form
1024102464/32 warps, then what happens to the threads that live past warp 32? Do they get scheduled to run on the MP in SMTP fashion if they are not in a warp? On another maybe related point, are all 32 threads with fixed threadIdx.x and threadIdx.y, (free in the value of threadIdx.x) live in the same warp? I am writing some programs to test this kind of thing but it would be good to know from the start? can I or can’t I define giant thread blocks with dim3 B(1024,1024,64)? Will all of this block be seen as warps scheduled on the MP to which the block is assigned? Because each MP has 32 cuda cores (SP)'s maybe all the maxThreadsPerBlock means is that this is the maximum that can run at once on the MP, but all warps eventually get their turn. Is this right?

Another topic:
I read the articles from pgroup.com to try to glean whats going on, but sometimes I cant be sure what is said applies to the C2050 fermi. At one point, the article states that more than 8 thread blocks can run on a given MP at one time. Is this true also for the C2050? Does this means that only 8 thread blocks take part in the warp scheduler for the MP?

Another question. I read that only 48 simultaneously active warps can be actively running at once over all the MP’s (and I have 14 of them for the C2050). It seems like the MP’s are independent agents and if so why is there this limit over all the MP’s?

avidday · January 18, 2011, 3:45pm

The block dim limits are constraints in addition to the threads per block limit. So you can have maximum block sizes of (1024,1,1), or (512,2,1), or (256,4,1), or (256,2,2), etc so that valid block sizes satisfy blockDim.x * blockDim.y * blockDim.z <= 1024

Topic		Replies	Views
threads per block / multi processor, contradiction ? CUDA Programming and Performance	5	1657	January 23, 2009
Scheduling Thread Blocks CUDA Programming and Performance	5	1243	July 29, 2021
Why is max threads per sm larger than max threads per block? CUDA Programming and Performance	3	1290	January 5, 2024
CUDA - thread block confusion concept clearity sought CUDA Programming and Performance	6	3005	November 10, 2011
Maximum block per grid CUDA Programming and Performance cuda	4	3754	March 24, 2023
Partitioning CUDA Programming and Performance	0	1999	October 6, 2011
Thread Number Limitation CUDA Programming and Performance	3	3891	December 22, 2008
Question about grid/block/thread sizes CUDA Programming and Performance	3	12295	November 13, 2012
Relationship between Warp and Thread Block on SM CUDA Programming and Performance cuda	2	535	November 10, 2023
Maximum number of blocks Legacy PGI Compilers	5	2406	April 7, 2020

C2040 /fermi limits

Related topics