I got some error information as:
Kernel execution failed!!! in file <gpuforward .cu>, line 101 : too many resources requested for launch.
I did a google search and found it may be caused by too many registers or shared memory required. I believe it’s due to too the registers as I remove some of them, the error is gone. However, I donot understand why. I’m using Tesla 1060. Running the device Query shows It has 16384 registers per multiprocessor. I set each block with 512 threads. So if each multiprocessor has two blocks, each thread should be able to use 16384/1024=16 registers. But actually, I used only 14 registers in my kernel. And it seems I can only use 10 of them.
Moreover, I can claim the 11th register without using it. Once I use it. It gives the same error as above.
Did I misunderstand the limits on registers I can use?
Thank you very much.