I want to get GPU memory data for a Python file, but all I’m getting when I run nsys profile is the os runtime data and timeline. This is what I’m running:
Ok, so either your program is not actually using the GPU, or we are not managing to capture the GPU activity. I suggest the latter as an option because I do see one python process (pid 1376787) that there is no python sampling info, but there is a “pt_main_thread” which is where I assume your actual pytorch workload must be happening.
I will suggest you try the following:
Try again with the newest version of nsys. You are using 2023.4, newest is 2024.4.
Try again with only nsys progile -t cuda python3 perform_reconstruction.py to see if you capture cuda events. It would be interesting to know if you get any different behavior.
If possible, try without multiprocessing so there is only one python process to profile, and see if that works. It is my understanding that what you are doing should work, but lets verify that the program is in fact using the GPU and that we are able to capture the cuda trace.
If you still are unable to observe any cuda events in the profiler, check the output of nvidia-smi while the program is running, and see if it lists python as a process with an active cuda context.