According to what I know, G80/G92 architecture has 8192 registers. Visual profiler tells me my kernel uses 81 registers.
81*96 < 8192
Yet I get the dreaded “too many resources for launch” error when running 96 threads instead of 64. Even when limiting the registers to 72, I get the same error. I don’t understand why.
Shared memory is not an issue (using 24 bytes static shared memory only).
Grid dim is something in the order of (1000,1,1)
block dim is (96, 1, 1)
This is bizarre. I want to achieve better occupancy, but can’t have it.