Hi! I am using “source” in nsight compute, and I am tracking register using “live register”. But total usage is 149, the live register max number is 102?
I am using 3050PC. CUDA11.7. So I should always add 47 to max live register and equal to real max register?
Also, I am profiling cublas’ compiled kernel, here general register usage is 122, but max live register is 165…? How can this happen?
In the source page, you’re only seeing the live registers from your code. The number in the red box comes from the compiler and could include requirements from system calls, CUDA runtime, etc… It isn’t always going to be the same gap, so you can’t just assume you’ll always need 47 more, for example.
In the second screenshot, that’s a bug. If you have a reproducer you could share, we could take a look. But we’re always working on trying to improve the quality of this data. Thanks for getting in touch.
1 Like
Basically I am writing matmul kernels. Maybe you can have a try on this file?
https://github.com/Yinghan-Li/YHs_Sample
Thank you!!!