I am trying to profile a deep learning training workload on AGX Orin using Nsight systems. Without the profiler, the workload consumes about 4GB of memory. But when I run it with remote profiling using nsight systems, the memory usage keeps increasing and within a few minutes all 32GB is used up. Is this a memory leak or expected behavior?
I would not expect it to take up this much memory.
@eaverin Since Andrey is out, can you respond to this or refer it to the correct engineer?
Any updates on this?
Thank you @virtual.ramblings for reporting this issue. This sounds like a bug and something we would like to investigate further. What kind of details can you share with us? Will you be willing to share the report file with us? Please note that the report files might contain information about the profiled process and it’s environment (such as filenames, environment variables, process names, symbols and function names, and many more), so please only share the report file if you are comfortable with it.
Thanks for your reply. I will check and get back to you on sharing the report file. Just to be clear, are you referring to the .nsys-rep file that is generated at the end of the profiling session? Will that be sufficient to debug?
.nsys-rep file contains a lot of information to help understand the environment, and get a clue on how to reproduce the issue.
Apologies for the delay, had to work on something else. Here is a link to the .nsys-rep file which has a 5 min run of the workload and the memory usage increased to almost 8GB.
Hi @Andrey_Trachenko, Any updates on this?
Hi @virtual.ramblings, sorry for the delay, I was out for holidays. We’ll look into the issue. Thank you very much for attaching the report file.
No problem! Please let me know what you find.
Hi @virtual.ramblings, here are some findings based on the attached report file:
The CPU sampling frequency is too high - 10 kHz. Not only it causes a lot of events to be generated over the 5 minutes, which slows down the timeline processing, but it also causes a considerable CPU overhead on the devkit, and can skew the results. Given that most of the code seems to be in Python, in which case the backtraces are not very useful, my recommendation is to turn off CPU sampling completely, or dial down the sampling rate to 100 Hz.
Thank you so much for your feedback! Will try this out.
To disable CPU sampling altogether, is this (picture attached) the correct setting?
Do you think the CPU sampling could cause the increased memory usage I’ve been seeing?
We have not confirmed this yet explicitly, but this will likely help.
I tried out the workload with no CPU sampling as you suggested. I still see the increased memory usage. Here is a link to the nsys rep file for a ~8min run I did today where the memory usage went up to ~15GB.
Thank you @virtual.ramblings. Unfortunately, this duration is really pushing the boundary of what we can reasonably support with detailed profiling information in Nsight Systems (from the docs: “Nsight Systems does not support runs of more than 5 minutes duration”). I may suggest using time limit and delayed start to help focus on the smaller part of the application runtime.
That being said, there is a chance that there is a memory leak in the injection libraries or the profiling agent process, and so more memory is used than otherwise would be necessary. We are looking into this, but at this point I don’t have an ETA on when we can have this investigated.
Thanks, I understand. Will stick to shorter runs for now.
I was curious about what you said about the memory leak. I tried out the following. I set the environment variables as indicated, and ran the workload without starting the profiler. Even then, I noticed this increase in memory usage, indicating a possible problem in the injection libraries as you said. In any case, if you find something, please update your findings here, even if it is at a much later point in time. Would really appreciate it.
Thanks for all your help!