Thread Scheduling / Limit maximum threads per block in each dimension vs Maximum thread on a SM

krarjun90 · June 20, 2012, 6:01am

Hi,

I am new to CUDA programming and I am stuck up with this thing.

In my PC with a GeForce GT 220 card, maximum threads on a SM is 512 and maxBlockDim is (51251264).

When I allocate more than 512 threads per block in one dimension then the program will crash.

In another system with Quadro 600 card, maximum threads per SM is 1536 and maxBlockDim is (1024102464).

But when I allocate 1536 or 2048 or 4096 threads per block in one dimension, the program executes properly. Hardly I find any error.

Is there any practical limit on number of threads can be there per block ? And what happens if number of threads increases ?

seibert · June 20, 2012, 3:51pm

If you are allocating more threads per block than is allowed (note that on compute capability 2.x the maximum number of threads per block is 1024, even though the maximum number of threads per SM is 1536), then an error code will be returned by one of the next CUDA function calls. How are you checking the return codes of the CUDA functions?

krarjun90 · June 21, 2012, 4:53am

HI,

Thanks for your reply.

This is my Kernel Function

global void myKernel(int * arr)

{

arr[threadIdx.x] = threadIdx.x;

}

In main(), I am fetching the error like this

int size = 1025; //1024 is the limit in 2.x

myKernel<<<1,size>>>(dArr);

err = cudaGetLastError();

printf(“\n Error %d = %s”,err,cudaGetErrorString(err));

The error is

Errror 9 : invalid configuration argument

When I print the result, I get 0,1,2…1023,garbage

But I am assigning arr[threadIdx.x] = threadIdx.x

so there must be a clear memory violation.

How can it assign arr[1024] to some value ?

Thanks and Regards,

Arjun

seibert · June 21, 2012, 2:10pm

I’m not sure I understand the problem. Since you got a CUDA error when trying to launch your kernel, the contents of dArr will be undefined. It might be left with whatever values were already in that part of GPU memory when you started. CUDA does not zero out memory when you allocate it.

Topic		Replies	Views
How to decide the optimal block size in CUDA CUDA Programming and Performance	4	27733	February 15, 2010
Invalid Configuration Argument CUDA Programming and Performance	2	1871	December 16, 2018
Why is max threads per sm larger than max threads per block? CUDA Programming and Performance	3	1218	January 5, 2024
I wonder maximum number of threads per block really limits the number of threads in each block. CUDA Programming and Performance	5	3980	February 9, 2024
Scheduling Thread Blocks CUDA Programming and Performance	5	1206	July 29, 2021
What is the maximum number of threads per block? CUDA Programming and Performance	4	21243	April 8, 2010
confusion of basic concepts CUDA Programming and Performance	8	6307	May 18, 2011
Question about grid/block/thread sizes CUDA Programming and Performance	3	12285	November 13, 2012
Max threads/blocks CUDA Programming and Performance	10	88	September 6, 2024
Thread Number Limitation CUDA Programming and Performance	3	3890	December 22, 2008

Thread Scheduling / Limit maximum threads per block in each dimension vs Maximum thread on a SM

Related topics