Difference between the registers usage information showed in ptx file and cubin file

We can find the usage of registers per thread and shared memory in cubin files :

name = Kernel

reg  = 18

it is very obvious that the code use 18 registers per thread

But when I check the ptx code, I also fine some information about registers

.reg .u16 %rh<4>;
.reg .u32 %r<74>;
.reg .f32 %f<75>;
.reg .pred %p<9>;

So, what’s the difference between them? What’s the meaning of the information in ptx code? the amount of registers? Thanks!

The .reg expressions in PTX are just state space declarations for different types of virtual register variables. They don’t have any relationship to the number of physical registers that the PTX assembler will use in final code. You can read more about PTX register spaces in Ch5 of the PTX reference guide if you are interested.

So, if I want to do some optimization in ptx code, does it will affect the performance?

Yes, you can optimize the PTX code, but usually not to save registers. PTX is in static single assignment form, which means that each variable (register) is only assigned once.

To modify your PTX code, first use the dryrun function of nvcc (nvcc --dryrun). The output tells you what nvcc does in the background. Run these commands until your PTX is generated, modify the PTX and then run the other commands. Et voilà , you have a program with optimized PTX code.

You can save registers indirectly by performing optimizations that split value live ranges, such as Rematerialization.