too many resources requested to launch error

My code launches fine but when I add and use the following code

#define SHL(x, s) ((unsigned short) ((x) << ((s) & 15)))
#define SHR(x, s) ((unsigned short) ((x) >> (16 - ((s) & 15))))
#define ROTL(x, s) ((unsigned short) (SHL((x), (s)) | (SHR((x), (s)))))
#define ROTR(x, s) ((unsigned short) (SHR((x), (s)) | (SHL((x), (s)))))

I am getting a “too many resources requested to launch error”. Any suggestions?

Thanks

How many “registers” is your code using? How many threads do you have per block?

I would rule out any shared-memory limitation here.

You can find your register usage in “cubin” file. – Use the “-keep” option of NVCC while compiling to generate the .cubin file

I will say the similar comments as Sarnath said :)

I suggest that You should use CUDA Occupancy Calculator.

http://developer.download.nvidia.com/compu…_calculator.xls

Probably you may find that occupacny is 0 % in the parameters that your program use(for example, # of threads, # of grids, # of registers and size of used shared memory)

Please check it. :)

I did and was able to remove a couple of variables, which solved the problem.

Thanks

In CUDA 1.1, you can have the compiler print the usage of registers, smem, and constant memory, so you no longer have to look at .cubin files. To do so, provide the “–ptxas-options=-v” option to nvcc.

Paulius