# of registers in different for different datatypes

Hi,

in our program we can switch between different precision modes (half, float, double), we noticed that the number of registers changes only slightly between half and float/double. We would expect that in half precision much less registers would be used. Why isn’t it the case? We compile on a DGX2 with the PGI 19.10 compiler.

half

ptxas info    : Function properties for kernel1
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 221 registers, 404 bytes cmem[0]

float

ptxas info    : Function properties for kernel1
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 253 registers, 408 bytes cmem[0]

double

ptxas info    : Function properties for kernel1
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 250 registers, 416 bytes cmem[0]

Thanks for your help.

In CUDA GPUs, registers are 32-bits. Converting an item from 32-bits to 16-bits will have no positive benefit on register usage. You would need to carefully utilize half2 to get any benefit, register-pressure wise.

Thank you for your explanation! Is half2 also available in CUDA FORTRAN? I cant’t find any information about it for FORTRAN.

After looking through https://www.pgroup.com/resources/docs/19.10/x86/pgi-ref-guide/index.htm#data-types, I believe the answer is no.