I am profiling some double precision codes on Fermi and the profiles shows me:
regperthread [ 63 ],
occupancy [ 0.333 ].
This means 512 threads, each using 63 registers, which equals a little less than 32K registers.
But, I am doing my business in double precision, so should not the maximum be 16K registers?
Am I misunderstanding something about the architecture?