A variable at the C level does not correspond to a register because frequently the compiler needs to store intermediate values in a calculation somewhere. (A statement in C can compile down to many instructions in PTX.) This can push the number of registers up, especially if you have complex expressions in your code. On the other hand, the assembler is also free to reuse a register for multiple variables when possible, so that can bring the register usage back down. In general, there is only a weak correlation between the number of variables at the C level and the number registers required on the device.
You can force the compiler to use fewer registers with the --maxrregcount option to nvcc. This can cause the compiler to put intermediate results into local memory (which is stored confusingly in the off-chip global memory area), which can slow things down. You can experiment and see if it helps in your case.