Potential memory leak - compute-sanitizer shows nothing

PS1234 · August 28, 2024, 7:59am

Hi Guys,

I develop an application which does image manipulations using cuda. I now saw that when I instantiate my application several times (e.g. in googletest) the cuda memory of my Nvidia Jetson Orin Nano runs out after several minutes and instantiations. I of course now suspect a memory leak, however when running my application with compute-sanitizer it does not show any leaks.

The output after only one instantiation with memory usage shown is as follows:

========= COMPUTE-SANITIZER
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test 
[ RUN      ] testDeinitMemoryFree
2024-08-28_09-49-29_162: [TESTLOG]: INFO: Cuda memory usage before initialization: 33.578042%
Initializing CUDA
Initializing CUDA
############### test code runs here - init and deinit ################
2024-08-28_09-49-42_936: [TESTLOG]: INFO: Cuda memory usage after initialization: 34.453110%
[       OK ] testDeinitMemoryFree (13952 ms)
[----------] 1 test (13952 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (13952 ms total)
[  PASSED  ] 1 test.
========= LEAK SUMMARY: 0 bytes leaked in 0 allocations
========= ERROR SUMMARY: 0 errors

the command line i used is:

sudo ./usr/local/cuda-11.4/bin/compute-sanitizer --tool memcheck --leak-check full --show-backtrace yes <testprogram>

If you look closely you see that memory usage after the test was run increased by ~1% … Anyways, compute-sanitizer does not recognize any leaked memory. Am I using the tool wrong, is there anything I could try to see where the increased memory usage is comming from? This pattern is consistent if I do init/deinit several more times until memory usage is at a level where I cant initialize cuda anymore because there is not enough memory left.

Additional Infos:

Its a camera application running on Nvidia Jetson
It uses libArgus for image aquisition and camera configuration
I use CudaHelper.h and ArgusSamples.h for initializing cuda for image aquisition.

I appreciate your help.

Robert_Crovella · August 28, 2024, 2:09pm

You might very well have a memory leak. There’s not enough information here to identify where it is. The compute-sanitizer tool does not track every possible form of a memory leak. For example, Jetson has physically unified memory - host and device memory refer to the same physical backing. Therefore, without further info about what you are doing, what you are testing, or what your printouts mean, its possible that the leak is through the use of a host API - which compute-sanitizer doesn’t track.

Usually, a leak can be traced to a specific sequence of API calls. Therefore divide-and-conquer is typically a fairly good strategy to narrow down the source of a leak.

striker159 · August 28, 2024, 4:54pm

To track down memory errors in host code, I would recommend using valgrind.

PS1234 · August 29, 2024, 6:38am

Hi !

Thanks for the answers. I know the description from me was not very detailed - sorry about that! The program is basically an API for the cameras which we test using googletest. The API does the buffer handling of the received images and also uses cuda to do some image optimizations. I will get deeper into the code and check if I can divide it further and pinpoint the location where the suspected leak is.

I also tried using valgrind but somehow it does not want to work and tells me there is an unhandled instruction:

ARM64 front end: load_store
disInstr(arm64): unhandled instruction 0xB8A18002
disInstr(arm64): 1011'1000 1010'0001 1000'0000 0000'0010
==3736== valgrind: Unrecognised instruction at address 0x4c6b958.
==3736==    at 0x4C6B958: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1)
==3736==    by 0x4BE9A7B: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1)
==3736==    by 0x4DC556B: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1)
==3736==    by 0x4C19013: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1)
==3736==    by 0x2B3F93: __cudart915 (in /var/tmp/testprog)
==3736==    by 0x60C53B7: __pthread_once_slow (pthread_once.c:116)
==3736==    by 0x2FE8BB: __cudart1186 (in /var/tmp/testprog)
==3736==    by 0x2AA62F: __cudart102 (in /var/tmp/testprog)
==3736==    by 0x2D6AFB: cudaMallocManaged (in /var/tmp/testprog)

It seems to not handle cudaMallocManaged correctly.

Curefab · August 29, 2024, 10:45am

Have you tried non-managed memory? (It is faster anyway, not sure about Jetson)

PS1234 · August 29, 2024, 2:40pm

Okay further insights I got.

As said we use googletest and while running one testcase with init/deinit (which supposedly have a memory leak) the cuda shared memory climbs up to ~3.5 -4 GB and then stays there consistently even with over 100 iterations. I also checked every cudaMalloc we are doing and for every allocation we do a deallocation using cudaFree. (cuda - Why doesn't CudaFree seem to free memory? - Stack Overflow) However when the testcase finishes and a second test case is started I get following error:

Initializing CUDA
NVMAP_IOC_GET_FD failed: Bad address
Error generated. /usr/src/jetson_multimedia_api/argus/samples/utils/CUDAHelper.cpp, initCUDA:81 Unable to initialize the CUDA driver API (CUresult unknown error)

the initializing CUDA output comes from the function CUDAHelper::initCuda() which I use to initialize the cuda context.

When the second testcase is started the memory usage does not drop and stays around ~3.5GB. I also observed that it does not climb higher and the free memory always stays around 1GB. I checked that using jetson-stats.

Unfortunately neither valgrind (which doess not work, see comment above) nor compute-sanitizer show any helpful output.

striker159 · August 29, 2024, 6:13pm

Ideally, you would be using RAII containers like thrust::device_vector which automatically allocates and frees the memory just like std::vector.

PS1234 · September 2, 2024, 8:50am

Hi,

comming back to this as I did more troubleshooting.

It seems that the problem is something else and not a memory leak. As said in my last comment we use googletest to test our camera streaming api. The tests run fine but after a certain amount of tests, where cuda was initialized using the CudaHelper function, the cuCtxCreate_v2 function fails with an unknown error as seen in my last post.
There is still plenty of memory available (>3GB) but still cuda does not seem to be able to create a new handle. It seems that it is able to run 25 tests and regardeless what comes after as the 26th test it fails

Is there any limit on how many handles an application can create? Why can I run a loop with 100 init/deinits and have no problem but when another testcase is started it crashes?

Is there any way to reset the handle count or increase/check any cuda related ressources??

Lots of questions - hopefully someone can help me further!

Thank you

Robert_Crovella · September 2, 2024, 4:54pm

If by “handle” you mean “context”, there is a limited number of device contexts that can be simultaneously resident on a GPU. Perhaps there is some resource that is not being properly destroyed. Perhaps your usage of the argus API does not involve a proper shutdown.

As a diagnostic, since you seem to be using the CUDA runtime API, (and I normally wouldn’t recommend this), you might try inserting a cudaDeviceReset() before

PS1234 · September 10, 2024, 7:24am

Hi Robert,

thanks for the answer, it was indeed the problem that we had multiple contexts open and didnt close all of them. We switched to only two contexts for our tests and destroyed them correctly while deinitializing and this fixed the memory usage issue!

system · September 24, 2024, 7:24am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to find leaks? cuda-gdb runs out of memory, but compute-sanitizer runs without erros CUDA-GDB	9	4340	March 22, 2023
Unused memory detected using compute-sanitizer initcheck for THRUST functions GPU-Accelerated Libraries cuda , thrust , debugging-and-troubleshooting	3	24	April 9, 2025
CUDA Memory Error Inspection Method Compute Sanitizer cuda	5	895	June 24, 2024
simple CUDA multi-threading crash on Nano Jetson Nano	15	3334	October 14, 2021
Creation and Cleanup of CUcontext CUDA Programming and Performance	10	1338	January 17, 2023
cudaMalloc illegal memory access on orin nano Jetson Orin Nano cuda	4	207	July 12, 2024
cudaMemGetInfo Any equivalent for host memory? CUDA Programming and Performance	7	9669	March 15, 2010
Memory leak running CUDA C program Jetson Orin Nano cuda	8	54	December 11, 2024
cufftPlan1D initialisation hides subsequence memory access errors CUDA Programming and Performance	8	653	November 24, 2020
Unexpected leak CUDA Programming and Performance	9	5902	October 13, 2008

Potential memory leak - compute-sanitizer shows nothing

Related topics