Running nsys profiling for GPU memory data on python

I want to get GPU memory data for a Python file, but all I’m getting when I run nsys profile is the os runtime data and timeline. This is what I’m running:


Once I get the nsys-rep file, I make a report.txt using nsys stats and it says that the sqlite file contains no GPU memory data:

How do I get the GPU memory usage data?
Here is my report file:
report.txt (12.1 KB)

Thanks

Greetings,

Out of curiosity, if you open the nsys-rep file in the GUI, do you see the GPU memory usage on the timeline?

Nope, I can’t see it on the timeline. Here’s the nsys-rep file for your reference.

historamnsys-rep.zip (8.1 MB)

@vallabhnadgir Thank you for sharing the output!

Ok, so either your program is not actually using the GPU, or we are not managing to capture the GPU activity. I suggest the latter as an option because I do see one python process (pid 1376787) that there is no python sampling info, but there is a “pt_main_thread” which is where I assume your actual pytorch workload must be happening.

I will suggest you try the following:

  1. Try again with the newest version of nsys. You are using 2023.4, newest is 2024.4.
  2. Try again with only nsys progile -t cuda python3 perform_reconstruction.py to see if you capture cuda events. It would be interesting to know if you get any different behavior.
  3. If possible, try without multiprocessing so there is only one python process to profile, and see if that works. It is my understanding that what you are doing should work, but lets verify that the program is in fact using the GPU and that we are able to capture the cuda trace.
  4. If you still are unable to observe any cuda events in the profiler, check the output of nvidia-smi while the program is running, and see if it lists python as a process with an active cuda context.

Thanks a lot! Using nsys 2024.2.3 worked!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.