How to use dim3 threadsPerBlock and numBlocks when parallelizing loops

mpilsur · January 5, 2020, 9:15am

I have the following kernel:

__global__ void kernel(int *d_array) {
unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;
unsigned int z = blockIdx.z*blockDim.z + threadIdx.z;
if (x < 2000000 && y < 12500 && z < 100000) {
    /* do stuff */
}

It compiles fine, but when running it it gives: invalid configuration argument.

At the end I found out that I can only use Dim3 ThreadsPerBlocks as following:

Dim3 ThreadsPerBlocks(1,32,32)

The C programming guide says: “A thread block size of 16x16 (256 threads), although arbitrary in this case, is a common choice.”, so does not help at all and there is no information anywhere how dim3 ThreadsPerBlocks works when using all the three dimensions.

So, why I am not able to use something like Dim3 ThreadsPerBlocks(1024,1024,64) and then dim3 numBlocks(1954,13,1563)?

njuffa · January 5, 2020, 9:49am

Appendix H of the CUDA Programming Guide unambiguously states that the maximum number of threads per block is 1024 across all currently supported GPU architectures. Using ThreadsPerBlocks(1024,1024,64) tries to configure a block with 64 million threads, which exceeds this limit.

mpilsur · January 5, 2020, 10:08am

Thanks, I missed that one, so that means ThreadsPerBlocks(1,32,32) → 32 x 32 x 1 = 1024 right?

mpilsur · January 5, 2020, 10:14am

Thanks, I found a post by Robert_Crovella explaining that.

Topic		Replies	Views
Threads and blocks concept question Invoking a kernel CUDA Programming and Performance	2	1668	December 5, 2007
Block and thread configuration CUDA Programming and Performance	2	1487	February 11, 2008
Correct setting of kernel parameters - problem with number of blocks CUDA Programming and Performance	2	783	August 8, 2013
Question about grid/block/thread sizes CUDA Programming and Performance	3	12285	November 13, 2012
Impact of block size Settings CUDA Programming and Performance cuda	3	541	May 4, 2023
Invalid Configuration Argument CUDA Programming and Performance	2	1871	December 16, 2018
A question for block dimension on cuda C programming CUDA Programming and Performance	2	847	November 18, 2015
Kernel Question CUDA Programming and Performance	3	4721	March 4, 2012
Thread Number Limitation CUDA Programming and Performance	3	3890	December 22, 2008
kernel ends abruptly due to execution configuration CUDA Programming and Performance	2	2694	September 17, 2008

How to use dim3 threadsPerBlock and numBlocks when parallelizing loops

Related topics