Hi all
I was running a kernel with 512 threads and 128 blocks, I register count from cubin file generated with -keep options in a makefile is
name = __globfunc__Z8flo_tfmuP6float2S0_ii
lmem = 0
smem = 8224
reg = 11
bar = 1
whereas cubin file generated with the command
nvcc -cubin -I ~uns/NVIDIA_CUDA_SDK/common/inc/ -O3 main.cu
name = __globfunc__Z8flo_tfmuP6float2S0_ii
lmem = 56
smem = 8224
reg = 23
bar = 1
Which one is the true value. Why/How It came up. can somebody clarify pliz?
The cubin shows the true register count. nvcc displays the number of registers allocated sequentially in the PTX (without re-use), whereas the final cubin re-uses registers where possible. The register optimization happens during the translation from PTX into the cubin assembly.
It is possible to lower the register count in the PTX code already - search for the volatile keyword in this forum. Often this can result in a lower register usage in the final cubin code (no machine optimizer is perfectly efficient, so lowering the complexity of the problem fed into the optimizer may help)