My cuda project compiled but when I ran it one kernel reported “too many resources requested for launch”. So I recompiled with --ptxas-options=“-v” to find that the troublesome kernel required
59 registers
256+256 bytes smem
132 bytes cmem[1]
8 bytes cmem[14]
However the occupancy calculator said this was OK for the blocksize of 128 I am using. It also said a blocksize of 64 was OK (and even better in some respects) but I still got the same error with a blocksize of 64.
So what is going on?
The GPU is a C1060.
ps And in the limited table, what does it mean if a number is highlighted in red?