Error in CUDA 5.5 Debug mode:"too many resources requested for launch"

My code works well in cuda 5.0 debug mode(with -G option).

But when I update cuda to 5.5, the error in debug mode is “too many resources requested for launch”
The code can run correctly when I reduce the size of block form 512 to 256.
Obviously, the gpu kernel use too many resigters.

I try to compile the code without -G option and set the block size to 512. I find the error disappears.

Does the “-G” option in CUDA 5.5 will limit the number of registers used in app.

My env is K20m + SLES11SP2 + CUDA 5.5

when you compile add the flags -Xptxas -v. This will give you a count of the registers per thread and the amount of shared memory used. The amount of registers available to a block is limited. For cc 2.0 is 32k while for 3.0 is 64k. I am not sur, but I think you can set a flag maxregcount and make spilling to the local memory.

The name of the nvcc command line flag for limiting the register count for all code in a compilation unit is -maxrregcount. See nvcc documentation. The extra ‘r’, i.e. “rreg”, refers to general purpose registers, rather than condition code registers, predicate registers, or address registers.

Also, I think that when you use the -g -G flags more registers are used.

Also, if you have some idea of how many threads youre going to use for the kernel, put the launch_bounds directive at the start of the function definition. e.g

#define MAX_THREADS_PER_BLOCK 256
#define MIN_BLOCKS_PER_SM 2

 
__launch_bounds__(MAX_THREADS_PER_BLOCK, MIN_BLOCKS_PER_SM)
__global__ void my_kernel(float * result, unsigned int* index)

That’s generally good practice to squeeze the last bit of performance out of the kernel :)