Live register is not accurate?

Hi! I am using “source” in nsight compute, and I am tracking register using “live register”. But total usage is 149, the live register max number is 102?

I am using 3050PC. CUDA11.7. So I should always add 47 to max live register and equal to real max register?

Also, I am profiling cublas’ compiled kernel, here general register usage is 122, but max live register is 165…? How can this happen?

In the source page, you’re only seeing the live registers from your code. The number in the red box comes from the compiler and could include requirements from system calls, CUDA runtime, etc… It isn’t always going to be the same gap, so you can’t just assume you’ll always need 47 more, for example.

In the second screenshot, that’s a bug. If you have a reproducer you could share, we could take a look. But we’re always working on trying to improve the quality of this data. Thanks for getting in touch.

1 Like

Basically I am writing matmul kernels. Maybe you can have a try on this file?
https://github.com/Yinghan-Li/YHs_Sample
Thank you!!!

Thanks. We are working on this bug and an upcoming version will have a fix.

1 Like

I met a same bug on a new case, with more detailed info for you:

I uploaded the .cu and .ncu-rep file here. I did the test on server with A100.

The problem is, in most parameters I think ncu is still using the wrong data? Or in computation, actually only use 103 registers but not 143? Which one is wrong?

I guess… 103 is correct? Because this version is really faster than before, if it is 143, the occupancy will be decreased 50%, which will be much more slower.

The 103 is the true upper limit on register needs. The 143 is related to the bug we’re working on.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.