Hi! I am using “source” in nsight compute, and I am tracking register using “live register”. But total usage is 149, the live register max number is 102?
I am using 3050PC. CUDA11.7. So I should always add 47 to max live register and equal to real max register?
In the source page, you’re only seeing the live registers from your code. The number in the red box comes from the compiler and could include requirements from system calls, CUDA runtime, etc… It isn’t always going to be the same gap, so you can’t just assume you’ll always need 47 more, for example.
In the second screenshot, that’s a bug. If you have a reproducer you could share, we could take a look. But we’re always working on trying to improve the quality of this data. Thanks for getting in touch.
The problem is, in most parameters I think ncu is still using the wrong data? Or in computation, actually only use 103 registers but not 143? Which one is wrong?
I guess… 103 is correct? Because this version is really faster than before, if it is 143, the occupancy will be decreased 50%, which will be much more slower.