I find that the number of registers is different from counting and with the ‘-Mcuda=ptxinfo’, the latter number is larger. So how can I count the number of registers in each thread?
I’m not sure what you mean by counting the number of registers. Can you please explain?
The “ptxinfo” is the diagnostic information given by the back-end device compiler which does the register allocation so should be correct.