Reducing Registers?

Does anyone have a general formula/framework/procedure/etc to go about reducing the register count in a kernel? I understand that increasing occupancy may not yield better results if the problem is bandwidth bound, so no need to re-iterate that.

recalculating values may help (no need to keep the register around)
also if (0) {} sprinkling sometimes helps, as does calling __syncthreads();

Another way is to use -maxregcount=xxx as an option to nvcc