register usage in cuda


This is a follow-up to my last post and I am actually getting really puzzled by the cuda compiler behavior.

So, first case, I have a couple of calculations which are of following type:

var_x -= var_y * var_z;

Now, this ‘-=’ operation uses one more register than the usual arithmetic operation. So, I have 27 such operations one after the other and I am using crazy amount of shared memory (38). I moved variables in shared memory whatever I could and different combinations but I am unable to bring the register count down.

Is there some trick for this kind of arithmetic operations? I can post my kernel if it helps.