Debug flags register usage

Hi,
I created an application with three different precisions. I am using the same compilation line for FP64, FP32, and FP16 precisions, the building line is:

nvcc -I/usr/local/cuda/include/ -L/usr/local/cuda/lib64/ -O3 -std=c++11 -gencode arch=compute_61,code=[sm_61,compute_61] --resource-usage ./cuda_half_mxm.cu -o ./cuda_mxm_half

The thing is, I am getting various register usages with and without -g -G flags.

For FP16 without -g -G
ptxas info : Used 32 registers, 348 bytes cmem[0]

For FP16 with -g -G
ptxas info : Used 39 registers, 72 bytes cumulative stack size, 348 bytes cmem[0], 28 bytes cmem[2]

For FP32 without -g -G
ptxas info : Used 32 registers, 348 bytes cmem[0]

For FP32 with -g -G
ptxas info : Used 16 registers, 348 bytes cmem[0]

For FP64 without -g -G
ptxas info : Used 32 registers, 348 bytes cmem[0]

For FP64 with -g -G
ptxas info : Used 18 registers, 348 bytes cmem[0]

Does anyone know why register usage vary for the same application with and without debug flags?

Is it right to think that register usage would be smaller for FP16?

Thanks

Hi, fernandofernandesant

Sorry for the late reply.
But I think the issue is not belong to this section. This is about debugger tools usage.
Would you please post your question under CUDA Programming?