Is this Correct?

Hi ,

I have a kernel

#define width 2400

#define height 1800

__global__ foo(unsigned char *array)


 long idx = blockDim.x * blockIdx.x + threadIdx.x;

long limit = width * height;

//Some operations done here.

 if(idx < limit)


	  // body here



This kernel is calling as:

foo<<<(width*height+511)/512, 512>>>( array );

But I get error cudaErrorLaunchFailour.

My question is :

(1) Can I use <<<(width*height+511)/512, 512>>> as grid dimensions and block dimensions respectively?

 because (width*height+511)/512 very big number.

try to launch with small numbers.
If no error it will be because your numbers are too big as they seems to be.

I think that: Maximum thread per block 512
Maximum blocks 65535

but see the reference manual to be sure.

For small numer of grid size it is working fine. :)

The same large grid size is working for onother function, but especially this function is giving the error.

Can you tell the way it should be handled?

I don’ t know, I can just tell that in the manual they say you will have a launch error if
cuda is not able to launch 1 block, and specially with problem of memory.

Isn’t this just this…st&p=542438 again?

The resource limits are clearly described in the CUDA user guide, as is how to calculate them, and in your other thread it was explained how to use compiler options to get the register and shared memory consumption of a given kernel. Why not actually do a spot of reading and thinking about your problem? You might actually learn something…

Your grid can’t be more than 65535 in each dimension. The largest grid can be 65535*65535 = 4,294,836,225 blocks. You can turn a one-dimensional grid into a 2-dimensional grid using advice from this thread. Or you could simply use (width+511)/512 for the x and height for the y dimension of the grid. I also believe avidday’s advice is very good.