Hi,
I have a kernel that uses 18 regs, according to the occupancy calc if I use one less reg I go from 56% to 84%. Now I know it doesnt mean
I’ll get a better performance, but something that’s worth while trying. using --marregcount didnt help.
I have some extra smem to spare and thought that if I’d define a smem array of blockDim.x size and use this as a temporary array instead
of a registery that would save the extra register - but I keep getting the same 18 regs.
Any suggestions ? how should I go about this? the ptx file is too complex for me for now :)
You’re actually fighting the optimization logic in the compiler, which is also working to minimize register use. If you really want to spend the time on this issue, you could use decuda to disassemble the .cubin file and get an idea of what the optimizer did with the registers. The .ptx file that you mention is actually the preliminary assembly code where registers are only initialized once (that makes the job easier for said optimizer to do its thing).