Memory leak for CUDA runtime lib

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.8.1
DRIVE OS 6.0.6
[*] DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
[*] Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
[*] other

SDK Manager Version
1.9.3.10904
[*] other

Host Machine Version
[*] native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

I’m using the CUDA runtime library and I’m seeing a memory leak.

The code is as follows:
int main(int argc, char** argv) {
cudaSetDevice(0);
{
cudaStream_t stream=nullptr;
cudaStreamCreate(&stream);
cudaStreamDestroy(stream);
}
cudaDeviceSynchronize();
cudaError cuda_error=cudaDeviceReset();
if(cuda_error!=cudaSuccess){
std::cerr<<“cudaDeviceReset Error :”<<cuda_error;
}
return 0;
}

test cmd:
/opt/data/haihuawei/valgrind/bin/valgrind --tool=memcheck --leak-check=full --log-file=valgrind.log.txt ./cuda_test

log:
==311760== 8 bytes in 1 blocks are definitely lost in loss record 23 of 1,475
==311760== at 0x484B828: malloc (vg_replace_malloc.c:442)
==311760== by 0x2E98EC4F: ??? (in /usr/lib/libnvcucompat.so)
==311760== by 0x74EEBD7: ??? (in /usr/lib/libcuda.so.1)
==311760== by 0x7516CE7: ??? (in /usr/lib/libcuda.so.1)
==311760== by 0x74EF043: ??? (in /usr/lib/libcuda.so.1)
==311760== by 0x7371717: ??? (in /usr/lib/libcuda.so.1)
==311760== by 0x741E50B: ??? (in /usr/lib/libcuda.so.1)
==311760== by 0x750C72B: ??? (in /usr/lib/libcuda.so.1)
==311760== by 0x878BFBF: ??? (in /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcudart.so.11.4.291)
==311760== by 0x878EB93: ??? (in /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcudart.so.11.4.291)
==311760== by 0x8E8ADF3: __pthread_once_slow (pthread_once.c:116)
==311760== by 0x87D3613: ??? (in /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcudart.so.11.4.291)
==311760==
==311760== 12 bytes in 1 blocks are definitely lost in loss record 25 of 1,475
==311760== at 0x484B828: malloc (vg_replace_malloc.c:442)
==311760== by 0x2E98EC2F: ??? (in /usr/lib/libnvcucompat.so)
==311760== by 0x74EEBD7: ??? (in /usr/lib/libcuda.so.1)
==311760== by 0x7516CE7: ??? (in /usr/lib/libcuda.so.1)
==311760== by 0x74EF043: ??? (in /usr/lib/libcuda.so.1)
==311760== by 0x7371717: ??? (in /usr/lib/libcuda.so.1)
==311760== by 0x741E50B: ??? (in /usr/lib/libcuda.so.1)
==311760== by 0x750C72B: ??? (in /usr/lib/libcuda.so.1)
==311760== by 0x878BFBF: ??? (in /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcudart.so.11.4.291)
==311760== by 0x878EB93: ??? (in /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcudart.so.11.4.291)
==311760== by 0x8E8ADF3: __pthread_once_slow (pthread_once.c:116)
==311760== by 0x87D3613: ??? (in /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcudart.so.11.4.291)
==311760==

Dear @haihua.wei,
Below is the output observed on my machine.

nvidia@tegra-ubuntu:/usr/local/cuda/samples/0_Simple/simpleCUDATest$ cat simpleCUDATest.cu
#include <cuda_runtime.h>

// includes, project
#include <helper_cuda.h>
#include <helper_functions.h>
int main(int argc, char** argv)
{
cudaSetDevice(0);
{
cudaStream_t stream=nullptr;
cudaStreamCreate(&stream);
cudaStreamDestroy(stream);
}
cudaDeviceSynchronize();
cudaError cuda_error=cudaDeviceReset();
if(cuda_error!=cudaSuccess){
std::cerr<< "cudaDeviceReset Error :"<<cuda_error;
}
return 0;
}

nvidia@tegra-ubuntu:/usr/local/cuda/samples/0_Simple/simpleCUDATest$ valgrind --tool=memcheck --leak-check=full --log-file=/home/nvidia/valgrind.log.txt ./simpleCUDATest
Illegal instruction
nvidia@tegra-ubuntu:/usr/local/cuda/samples/0_Simple/simpleCUDATest$ cat ~/valgrind.log.txt
==10016== Memcheck, a memory error detector
==10016== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==10016== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==10016== Command: ./simpleCUDATest
==10016== Parent PID: 9795
==10016==
ARM64 front end: load_store
disInstr(arm64): unhandled instruction 0xB8A18001
disInstr(arm64): 1011'1000 1010'0001 1000'0000 0000'0001
==10016== valgrind: Unrecognised instruction at address 0x53d8398.
==10016==    at 0x53D8398: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x543F713: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x543FD43: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x53D34D3: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x13868B: __cudart106 (in /usr/local/cuda-11.4/samples/0_Simple/simpleCUDATest/simpleCUDATest)
==10016==    by 0x1387EB: __cudart917 (in /usr/local/cuda-11.4/samples/0_Simple/simpleCUDATest/simpleCUDATest)
==10016==    by 0x48A23B7: __pthread_once_slow (pthread_once.c:116)
==10016==    by 0x187C17: __cudart1189 (in /usr/local/cuda-11.4/samples/0_Simple/simpleCUDATest/simpleCUDATest)
==10016==    by 0x12EDB3: __cudart104 (in /usr/local/cuda-11.4/samples/0_Simple/simpleCUDATest/simpleCUDATest)
==10016==    by 0x156447: cudaSetDevice (in /usr/local/cuda-11.4/samples/0_Simple/simpleCUDATest/simpleCUDATest)
==10016==    by 0x11140B: main (in /usr/local/cuda-11.4/samples/0_Simple/simpleCUDATest/simpleCUDATest)
==10016== Your program just tried to execute an instruction that Valgrind
==10016== did not recognise.  There are two possible reasons for this.
==10016== 1. Your program has a bug and erroneously jumped to a non-code
==10016==    location.  If you are running Memcheck and you just saw a
==10016==    warning about a bad jump, it's probably your program's fault.
==10016== 2. The instruction is legitimate but Valgrind doesn't handle it,
==10016==    i.e. it's Valgrind's fault.  If you think this is the case or
==10016==    you are not sure, please let us know and we'll try to fix it.
==10016== Either way, Valgrind will now raise a SIGILL signal which will
==10016== probably kill your program.
==10016==
==10016== Process terminating with default action of signal 4 (SIGILL)
==10016==  Illegal opcode at address 0x53D8398
==10016==    at 0x53D8398: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x543F713: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x543FD43: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x53D34D3: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x13868B: __cudart106 (in /usr/local/cuda-11.4/samples/0_Simple/simpleCUDATest/simpleCUDATest)
==10016==    by 0x1387EB: __cudart917 (in /usr/local/cuda-11.4/samples/0_Simple/simpleCUDATest/simpleCUDATest)
==10016==    by 0x48A23B7: __pthread_once_slow (pthread_once.c:116)
==10016==    by 0x187C17: __cudart1189 (in /usr/local/cuda-11.4/samples/0_Simple/simpleCUDATest/simpleCUDATest)
==10016==    by 0x12EDB3: __cudart104 (in /usr/local/cuda-11.4/samples/0_Simple/simpleCUDATest/simpleCUDATest)
==10016==    by 0x156447: cudaSetDevice (in /usr/local/cuda-11.4/samples/0_Simple/simpleCUDATest/simpleCUDATest)
==10016==    by 0x11140B: main (in /usr/local/cuda-11.4/samples/0_Simple/simpleCUDATest/simpleCUDATest)
==10016==
==10016== HEAP SUMMARY:
==10016==     in use at exit: 80,990 bytes in 56 blocks
==10016==   total heap usage: 81 allocs, 25 frees, 292,782 bytes allocated
==10016==
==10016== 56 bytes in 1 blocks are possibly lost in loss record 7 of 18
==10016==    at 0x4849D8C: malloc (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
==10016==    by 0x51FCB8F: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x51F27F3: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x51DC1AB: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x400E8B3: call_init.part.0 (dl-init.c:72)
==10016==    by 0x400E9B3: call_init (dl-init.c:30)
==10016==    by 0x400E9B3: _dl_init (dl-init.c:119)
==10016==    by 0x4BC620B: _dl_catch_exception (dl-error-skeleton.c:182)
==10016==    by 0x4012A13: dl_open_worker (dl-open.c:758)
==10016==    by 0x4BC61AB: _dl_catch_exception (dl-error-skeleton.c:208)
==10016==    by 0x40121A3: _dl_open (dl-open.c:837)
==10016==    by 0x48C409B: dlopen_doit (dlopen.c:66)
==10016==    by 0x4BC61AB: _dl_catch_exception (dl-error-skeleton.c:208)
==10016==
==10016== 336 bytes in 6 blocks are possibly lost in loss record 14 of 18
==10016==    at 0x4849D8C: malloc (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
==10016==    by 0x51FCB8F: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x51F2753: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x51DC1AB: ??? (in /usr/lib/libcuda.so.1)
==10016==    by 0x400E8B3: call_init.part.0 (dl-init.c:72)
==10016==    by 0x400E9B3: call_init (dl-init.c:30)
==10016==    by 0x400E9B3: _dl_init (dl-init.c:119)
==10016==    by 0x4BC620B: _dl_catch_exception (dl-error-skeleton.c:182)
==10016==    by 0x4012A13: dl_open_worker (dl-open.c:758)
==10016==    by 0x4BC61AB: _dl_catch_exception (dl-error-skeleton.c:208)
==10016==    by 0x40121A3: _dl_open (dl-open.c:837)
==10016==    by 0x48C409B: dlopen_doit (dlopen.c:66)
==10016==    by 0x4BC61AB: _dl_catch_exception (dl-error-skeleton.c:208)
==10016==
==10016== LEAK SUMMARY:
==10016==    definitely lost: 0 bytes in 0 blocks
==10016==    indirectly lost: 0 bytes in 0 blocks
==10016==      possibly lost: 392 bytes in 7 blocks
==10016==    still reachable: 80,598 bytes in 49 blocks
==10016==         suppressed: 0 bytes in 0 blocks
==10016== Reachable blocks (those to which a pointer was found) are not shown.
==10016== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==10016==
==10016== For lists of detected and suppressed errors, rerun with: -s
==10016== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

I don’t see definitely lost messages in log.

@SivaRamaKrishnaNV What your machine is like in a Drive OS and CUDA environment. Can you give us a reference?

Dear @haihua.wei,
Yes. I quickly tested on DRIVE AGX Orin platform with DRIVE OS(latest internal release).
I will try with recent devzone release and update you the results

@SivaRamaKrishnaNV It’s been a huge help to us. Thank you so much.

Dear @haihua.wei,
How about using https://docs.nvidia.com/cuda/compute-sanitizer/index.html ?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.