In trying to figure out why my code is slow, I decided to decuda my .cubin file and look at what the core routines are actually doing. Most of it is quite understandable, but there is one big question.
In these two lines of code
add.half.rn.f32 $r4, s[$ofs2+0x0034], -$r78
add.half.rn.f32 $r5, s[$ofs2+0x0030], -$r79
the registers $r78 and $r79 are used. But I compile with -maxrregcount=20, so only registers $r0 to $r19 ought to be used. I can’t seem to get rid of these registers. Can anyone tell me anything about this?