Why does the maxrregcount command line option round up to the nearest multiple of 4 registers (page 20 of NVCC_1.0.pdf) when the highest % occupancy requires that the number of registers be <= 10?
I was waiting for someone else to ask this question since 0.9 came out. No answer here - could that be because the answer is a bit awkward? I must say my immediate thought was that regs gets rounded up due to hardware or loader limitations. Has anyone done careful measurements of performance differences between 10 and 11 or 12 registers?
ed: 100% occupancy would still be possible, just 8 registers would be the ceiling…
ed: OK, checking, and the trip is definitely 10 regs for 100% so does seem a strange limitation.