profiling double precision codes Profiler readings for DP codes seem fishy

I am profiling some double precision codes on Fermi and the profiles shows me:

  • regperthread [ 63 ],
  • occupancy [ 0.333 ].
    This means 512 threads, each using 63 registers, which equals a little less than 32K registers.
    But, I am doing my business in double precision, so should not the maximum be 16K registers?
    Am I misunderstanding something about the architecture?

Registers are always 32 bits. If you use 64 bit types, they consume two registers.

I see.

I am reading the profiler output incorrectly.

What it means is that it is using 63 32-bit registers.

Some of them are paired up to store double precision numbers,

some are used for loop counters, etc.

My bad.

I am still in the CPU mindset.

Thanks for the reply.