The title already gives away that one of my kernels for some reason reports that I requested too many resources. I did some research and learned that is due to either shared memory or registers. The kernel is called with a dynamic amount of shared memory, so normally I won’t get any information on this when inspecting the verbose output of ptxas (-Xptxas -v). However, for my testcase I know what the amount will be (418 * sizeof(uint)), so I just hardcoded this as a static shared memorypool instead. The output is as follows:
ptxas info : Used 17 registers, 6688 bytes smem, 44 bytes cmem
The number of threads per block is 418, so the total number of registers per MP is 7106, which is well within the bounds of my Tesla M2075 (cc 2.0). The amount of shared is not very impressive either, is it? Why then does this kernel report that I requested too many resources?
Any ideas would be welcome :-)