Profiling memory leak issues on Jetson AGX Module

Hi,

I am facing memory leak issues in various rules/ modules of our Incident detection solution. The system details are as follows:

Name: Jetson AGX Module(Advantech- MIC-730AI)
Version details:
Jetpack 4.6
CUDA 10.2
Deepstream 6.0.1

To debug the memory leak issues, we thought of using the NVIDIA Nsight System, please let us know if it the correct choice?

For debugging we followed following steps:

  1. Initiated the application with nsight cli(2021.2.3) profile option, generated the reports.

  2. Tried to analyze the reports in Nsight GUI. As there was no GUI of Nsight Systems on AGX Module, I installed the Nsight(2021.2.1) on my 1650 TI system and tried importing the report but it gave error:
    “Failed to import QDSTRM from: /home/smarg/AGX_Reports/report1.qdstrm
    Qdstrm version 2021.2.3.8-73c8c79 is not supported. The host GUI and qdstrm versions must match to import successfully. When importing the qdstrm file, please use the host GUI that was packaged with the software used to collect the data.”

  3. I didn’t find any way of analysing the report using the CLI commands in the documentation.

  4. Also, tried to directly profile the application from the host system (1650 TI Nsight System GUI 2021.2.1) but it says “Target is not supported”

Note: We have a contraint that we can’t upgrade the Jetpack version on AGX Module due to unavailability of new Jetpack version with BSP patch from Advantech.

Please suggest, what I am doing wrong or is there any way to analyze the reports using Nsight CLI?

Hi @sheetal.vishwakarma
Nsight is a tool to debug performance. It will help you to understand the performance bottleneck in the code workflow.
For memory leak or debug purpose, eclipse or GDB might help in locating the leak component.

For Nsight setup issue, you can refer to below topic in case it helps:

Hi @SunilJB ,

I have already debugged the code and used other tools like tracemalloc to identify the code regions responsible for growing memory but after hours of analysis memory at code level variables and functions keep on freeing after use.

So, now I wanted to understand the GPU related usage parameters to understand if my models or any other underlying thing is consuming the memory and not cleaning up properly.

Just to explain the use case, when the application starts initial memory usage is 4 to 6 GB of GPU memory which grows to 31 GB if the application continuously run on 2 cameras for approx. 2-3 days. Then the system gets choked.

Futher, the Nsight version I have on AGX Module doesn’t have gui executable and I can’t upgrade my Jetpack version due to above stated reason.

So, was looking for some way to analyze the report using CLI or for host system is there a way to convert the report and then open up in host Nsight GUI?

Hi @sheetal.vishwakarma,
For memory related issue, may be you can use CUDA-MEMCHECK tool

Could you please share more details what specific info you are trying to retrieve from profiler tool to resolved this memory leak issue?

Thanks

Did the Nsys CLI only generate a .qdstrm file, or was there also a .nsys-rep (or .qdrep) file? It is true that .qdstrm files are only matched with the exact Nsys GUI, but the .qdrep and the .nsys-rep files can be loaded into that version of the GUI or any more recent version of the gui.

Can you tell me what your CLI options are? Maybe we can figure out what happened with your result file?

There is a way to export the results file as sqlite (or several other formats) that you can examine using other tools.

@sheetal.vishwakarma,

In addition to the other suggestions on the thread, you may also want to look at Valgrind (https://valgrind.org/)