Invalid Configuration Argument

I’m very new to cuda. I am currently working on some simple kernels to getting a better knownledge.

Let me explain my problem:

I have a matrix with independent elements and I want to manipulate each element of the matrix.

My configuration looks like the following:

#define WIDTH  640

#define HEIGHT 480

#define NUM_THREADS 16

...

dim3 blockDim(NUM_THREADS, NUM_THREADS);

dim3 gridDim(WIDTH/blockDim.x, HEIGHT/blockDim.y);

kernel<<<gridDim, blockDim>>>(pixels, WIDTH, HEIGHT);

cutilCheckMsg("Kernel invocation failed");

The above code runs fine, but if I change NUM_THREADS to a higher value (e.g. 32) then I always get an

“Invalid Configuration Argument” error.

The code runs on a GTX260M with max 512 Threads per Block. So where is the problem?

if you set NUM_THREADS = 32 , then

according to blockDim(NUM_THREADS, NUM_THREADS);

the thread block has 32 x 32 = 1024 threads, which exceeds hardware limit (512 threads per block),

hence you obtain error message “Invalid Configuration Argument”

Thanks for the answer. Stupid mistake :)

well I can’tl figure out… WHETHER THE PRODUCT OF ALL DIMENSIONS SHOULD BE LESS THAN OR EQUALTO 512 or THE

BLOCKSIZE CAN BE 512 X 512 X 64 (quadro FX 4600) ?? :confused:

512 is the maximum total number. But it can be even lower, depending on register usage.

I am launching a kernel with following configuration :

int NO_OF_PIX = 12000;
int NO_OF_SCAN = 4000;

dim3 block_dim(500,500);
dim3 grid_size(NO_OF_PIX/block_dim.x,NO_OF_SCAN/blockdim.y);

THE KERNEL LAUNCH IS SUCCESSFLL AND THE RESULTS ARE CORRECT. ACCORDING TO YOUR STATEMENT THE KERNEL LAUNCH SHOULD FAIL?? ALSO,

CAN ANYONE EXPLAIN THE FOLLOWING DEVICE QUERY LINES : (possibly through a good example) :unsure:

maximum threads in a block : 512 X 512 X 64
maximum grid size = 65535 X 65535 X 1

I own a quadro FX 4600 and the above program is failing for NO_OF_SCANS = 5000 ?? :whoops:

That is invalid and will fail on any card. The limit is 512 threads total per block, with the maximum block dimension of 512 in x or y and 64 in z, ie. x<512, y<512, z<64 and xyz <= 512, So the limiting valid block sizes are (512,1,1) or (1,512,1), or (256,2,1) or (2,256,1), or (8,8,64), etc. No ambiguity at all.

The fact you think that a block of (500,500) works simply means you are not doing correct error checking and you don’t understand or are not correctly verifying the output of any kernel launched that way.

THE problem was THAT I WAS USING dim_Blocks instead of dim_Grid and vice versa in my kernel launch code :P … :confused:

AND THAT IS WHY I WAS ALLOWED WITH 500,500 :P

NEWAYZ…it was a typo mistake …soryy for bothering all :unsure:

THANX FOR YOUR HELP …

Lesson learnt : THREADS/BLOCK can’t be greater than 512 :) … individual dimension can be < 512,512,64

:)