I don’ t know, I can just tell that in the manual they say you will have a launch error if
cuda is not able to launch 1 block, and specially with problem of memory.
The resource limits are clearly described in the CUDA user guide, as is how to calculate them, and in your other thread it was explained how to use compiler options to get the register and shared memory consumption of a given kernel. Why not actually do a spot of reading and thinking about your problem? You might actually learn something…
Your grid can’t be more than 65535 in each dimension. The largest grid can be 65535*65535 = 4,294,836,225 blocks. You can turn a one-dimensional grid into a 2-dimensional grid using advice from this thread. Or you could simply use (width+511)/512 for the x and height for the y dimension of the grid. I also believe avidday’s advice is very good.