My code works well in cuda 5.0 debug mode(with -G option).
But when I update cuda to 5.5, the error in debug mode is “too many resources requested for launch”
The code can run correctly when I reduce the size of block form 512 to 256.
Obviously, the gpu kernel use too many resigters.
I try to compile the code without -G option and set the block size to 512. I find the error disappears.
Does the “-G” option in CUDA 5.5 will limit the number of registers used in app.
when you compile add the flags -Xptxas -v. This will give you a count of the registers per thread and the amount of shared memory used. The amount of registers available to a block is limited. For cc 2.0 is 32k while for 3.0 is 64k. I am not sur, but I think you can set a flag maxregcount and make spilling to the local memory.
The name of the nvcc command line flag for limiting the register count for all code in a compilation unit is -maxrregcount. See nvcc documentation. The extra ‘r’, i.e. “rreg”, refers to general purpose registers, rather than condition code registers, predicate registers, or address registers.
Also, if you have some idea of how many threads youre going to use for the kernel, put the launch_bounds directive at the start of the function definition. e.g