Invalid Configuration Argument

minterciso · December 16, 2018, 7:27pm

Hello all, I’m studying CUDA and trying to optimize some test code and I reached a point were I’m clearly missing something.
I have a GTX 1060 and according to the deviceQuery this is the maximum threads and blocks:

Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)

So, I expected that a code like this should work:

#define POP_SIZE 160
#define LEN_SIZE 20

...
                
                dim3 threadsPerBlock(32,32);
		dim3 numBlocks(POP_SIZE/threadsPerBlock.x, LEN_SIZE/threadsPerBlock.y);
		fitness<<<numBlocks, threadsPerBlock>>>(d_dest, d_pop);
		CHECK(cudaPeekAtLastError());
		CHECK(cudaMemcpy(h_pop, d_pop, sizeof(individual)*POP_SIZE, cudaMemcpyDeviceToHost));
...

Even though 32x32=1024 threads per block, I’m getting the “Invalid configuration error”. After searching on the CUDA Programing Guide, I always found that the maximum amount of threads is 1024, it’s pretty clear that at page 9:

There is a limit to the number of threads per block, since all threads of a block are
expected to reside on the same processor core and must share the limited memory
resources of that core. On current GPUs, a thread block may contain up to 1024 threads.

If however I change the dimension of the threadsPerBlock to (32,20), it works flawlessly. So what gives? What am I understanding wrong about kernel launch sizes?

If interest, here’s the kernel:

__global__ void fitness(char s_dest[LEN_SIZE], individual *pop)
{
	unsigned int pop_idx = threadIdx.x + blockDim.x * blockIdx.x;
	unsigned int str_idx = threadIdx.y + blockDim.y * blockIdx.y;
	individual *ind = NULL;
	if(pop_idx < POP_SIZE && str_idx < LEN_SIZE)
	{
		ind = &pop[pop_idx];
		unsigned int l_fit = abs( (int)ind->s[str_idx] - (int)s_dest[str_idx]);
		atomicAdd(&ind->fitness, l_fit);
	}
}

I don’t like the atomicAdd() there, but this is a reason for another topic.

Thank you all.

njuffa · December 16, 2018, 7:33pm

In your case

LEN_SIZE/threadsPerBlock.y

(= 20/32) is zero, which is not a valid value for a dimension (it needs to be > 0).

minterciso · December 16, 2018, 7:37pm

Lord, thank you! I was so fixed on the thread size that forgot about this.

Topic		Replies	Views
Thread Scheduling / Limit maximum threads per block in each dimension vs Maximum thread on a SM CUDA Programming and Performance	3	1756	June 21, 2012
Question about grid/block/thread sizes CUDA Programming and Performance	3	12266	November 13, 2012
Invalid configuraion argument CUDA Programming and Performance	2	1816	June 4, 2007
CUDA - thread block confusion concept clearity sought CUDA Programming and Performance	6	3000	November 10, 2011
I wonder maximum number of threads per block really limits the number of threads in each block. CUDA Programming and Performance	5	3976	February 9, 2024
Maximum number of threads on thread block CUDA Programming and Performance	12	73122	September 21, 2023
Invalid Configuration Argument CUDA Programming and Performance	7	7438	May 20, 2010
Strange "Invalid Configuration Argument" Error CUDA Programming and Performance	2	31628	January 16, 2008
Maximum stack size? CUDA Programming and Performance	7	784	March 24, 2024
What is the maximum number of threads per block? CUDA Programming and Performance	4	21240	April 8, 2010

Invalid Configuration Argument

Related topics