`cuCtxCreate` and `cuCtxDestroy` pairs have a memory leak

Here is my test code. I use memory profile of visual studio performance profiler for take snapshot of heap.

image

The results suggest that there is a leak.

My graphic card and driver version is NVIDIA RTX 3080 Ti and 537.13

and Here is CUDA Toolkit version information.

{
   "cuda" : {
      "name" : "CUDA SDK",
      "version" : "12.1.1"
   },
   "cuda_cccl" : {
      "name" : "CUDA C++ Core Compute Libraries",
      "version" : "12.1.109"
   },
   "cuda_cudart" : {
      "name" : "CUDA Runtime (cudart)",
      "version" : "12.1.105"
   },
   "cuda_cuobjdump" : {
      "name" : "cuobjdump",
      "version" : "12.1.111"
   },
   "cuda_cupti" : {
      "name" : "CUPTI",
      "version" : "12.1.105"
   },
   "cuda_cuxxfilt" : {
      "name" : "CUDA cu++ filt",
      "version" : "12.1.105"
   },
   "cuda_demo_suite" : {
      "name" : "CUDA Demo Suite",
      "version" : "12.1.105"
   },
   "cuda_nvcc" : {
      "name" : "CUDA NVCC",
      "version" : "12.1.105"
   },
   "cuda_nvdisasm" : {
      "name" : "CUDA nvdisasm",
      "version" : "12.1.105"
   },
   "cuda_nvml_dev" : {
      "name" : "CUDA NVML Headers",
      "version" : "12.1.105"
   },
   "cuda_nvprof" : {
      "name" : "CUDA nvprof",
      "version" : "12.1.105"
   },
   "cuda_nvprune" : {
      "name" : "CUDA nvprune",
      "version" : "12.1.105"
   },
   "cuda_nvrtc" : {
      "name" : "CUDA NVRTC",
      "version" : "12.1.105"
   },
   "cuda_nvtx" : {
      "name" : "CUDA NVTX",
      "version" : "12.1.105"
   },
   "cuda_nvvp" : {
      "name" : "CUDA NVVP",
      "version" : "12.1.105"
   },
   "cuda_opencl" : {
      "name" : "CUDA OpenCL",
      "version" : "12.1.105"
   },
   "cuda_sanitizer_api" : {
      "name" : "CUDA Compute Sanitizer API",
      "version" : "12.1.105"
   },
   "libcublas" : {
      "name" : "CUDA cuBLAS",
      "version" : "12.1.3.1"
   },
   "libcufft" : {
      "name" : "CUDA cuFFT",
      "version" : "11.0.2.54"
   },
   "libcurand" : {
      "name" : "CUDA cuRAND",
      "version" : "10.3.2.106"
   },
   "libcusolver" : {
      "name" : "CUDA cuSOLVER",
      "version" : "11.4.5.107"
   },
   "libcusparse" : {
      "name" : "CUDA cuSPARSE",
      "version" : "12.1.0.106"
   },
   "libnpp" : {
      "name" : "CUDA NPP",
      "version" : "12.1.0.40"
   },
   "libnvjitlink" : {
      "name" : "JIT Linker Library",
      "version" : "12.1.105"
   },
   "libnvjpeg" : {
      "name" : "CUDA nvJPEG",
      "version" : "12.2.0.2"
   },
   "libnvvm_samples" : {
      "name" : "NVVM Samples",
      "version" : "12.1.105"
   },
   "nsight_compute" : {
      "name" : "Nsight Compute",
      "version" : "2023.1.1.4"
   },
   "nsight_vse" : {
      "name" : "Nsight Visual Studio Edition (VSE)",
      "version" : "2023.1.1.23089"
   },
   "nvidia_driver" : {
      "name" : "NVIDIA Windows Driver",
      "version" : "531.14"
   }
}

Please don’t post code as a picture on these forums.

When people make reports like this, I usually suggest to re-verify results on the latest CUDA version and driver available.

If the results are confirmed, you may wish to file a bug.

2 Likes

Since this has host memory (usage) in view, its reasonable to conjecture that it might be OS-specific. But FWIW on linux I did not observe any change in process memory usage using ps -eo size,pid over 2500 loops of the above code, on CUDA 12.2. (Likewise there was no gradual increase in device memory usage as reported by nvidia-smi.)

I updated CUDA version to 12.2 and i ran over 100 loops of the above code.

image

This is result and still show memory leak.

and I don’t think it’s a matter of OS, because of profiler shows unresolved allocations heap memory allocated by nvcuda64.dll.

This is 4 allocations call stack.

5,720bytes

ntdll!0x7ffe8886d041()
nvcuda64!0x7ffe3035240f()
nvcuda64!0x7ffe30233d2d()
nvcuda64!0x7ffe2fe99282()
nvcuda64!0x7ffe2fe996a0()
nvcuda64!0x7ffe2fe986cf()
nvcuda64!0x7ffe2ff47856()

5,720bytes

ntdll!0x7ffe8886d041()
nvcuda64!0x7ffe3035240f()
nvcuda64!0x7ffe30233d2d()
nvcuda64!0x7ffe2fe99282()
nvcuda64!0x7ffe2fe997b3()
nvcuda64!0x7ffe2fe986cf()
nvcuda64!0x7ffe2ff47856()

320bytes

ntdll!0x7ffe8886d041()
nvcuda64!0x7ffe3035240f()
nvcuda64!0x7ffe300aaa2c()
nvcuda64!0x7ffe2ff57a2f()
nvcuda64!0x7ffe2ffd682c()
nvcuda64!0x7ffe2ff49591()
nvcuda64!0x7ffe2ff4a399()

320bytes

ntdll!0x7ffe8886d041()
nvcuda64!0x7ffe3035240f()
nvcuda64!0x7ffe300aaa2c()
nvcuda64!0x7ffe300aae9f()
nvcuda64!0x7ffe2ffd6aa6()
nvcuda64!0x7ffe2ff49591()
nvcuda64!0x7ffe2ff4a399()

Referring this to NVBUG 4308741
We will keep you updated . The engineering team will investigate this .

Doese it fixed?
I have same problem too.
When i decode 10~20 streams, (it mean that in example codes, 10~20 loops) than heap memory leak has about 1MB. ( RTX 3070, driver version 537.13)

The original issue reported in this thread is fixed in very recent windows drivers, including this one

when i updated the gpu driver (rtx2080, 546.33), still have the memory leak. It looks like cuinit(0) has memory leak.

the program like this:

int i = 5;
Sleep(20000);
cuInit(0);
Sleep(10000);
do{
CUcontext cuContext = NULL;
cuCtxCreate(&cuContext, 0, 0);
cuCtxDestroy(cuContext);

std::cout <<"GPU test end."<< std::endl;
Sleep(10000);

}while(i–);

and the memory like this:

when i run the above code, the host memory leak like this :

  • 4985360 ( 4985360 - 0) 2 allocs BackTraceC153472E

  •   2 (      2 -      0)	BackTraceC153472E	allocations
    

    ntdll!RtlAllocateHeap+AFD
    nvcuda64!cuProfilerStop+2C7774
    nvcuda64!???+0 : 7FF8F83E8836
    nvcuda64!???+0 : 7FF8F83E8DAA
    nvcuda64!???+0 : 7FF8F851F3B1
    nvcuda64!cuProfilerStop+199B9A
    nvcuda64!cuProfilerStop+19A70A
    nvcuda64!???+0 : 7FF8F852203B
    nvcuda64!???+0 : 7FF8F8497637
    nvcuda64!???+0 : 7FF8F84980E2
    nvcuda64!???+0 : 7FF8F849861A
    nvcuda64!cuGetErrorName+217
    nvcuda!cuInit+79

  • 2462720 ( 2462720 - 0) 1 allocs BackTraceC153482E

  •   1 (      1 -      0)	BackTraceC153482E	allocations
    

    ntdll!RtlAllocateHeap+AFD
    nvcuda64!cuProfilerStop+2C7774
    nvcuda64!???+0 : 7FF8F84CCD2B
    nvcuda64!???+0 : 7FF8F85221EF
    nvcuda64!???+0 : 7FF8F8497637
    nvcuda64!???+0 : 7FF8F84980E2
    nvcuda64!???+0 : 7FF8F849861A
    nvcuda64!cuGetErrorName+217
    nvcuda!cuInit+79

  • 655360 ( 655360 - 0) 1 allocs BackTraceC1535B2E

  •   1 (      1 -      0)	BackTraceC1535B2E	allocations
    

    ntdll!RtlAllocateHeap+AFD
    nvcuda64!cuProfilerStop+2C429F
    nvcuda64!???+0 : 7FF8F84E1F1F
    nvcuda64!???+0 : 7FF8F8497672
    nvcuda64!???+0 : 7FF8F84980E2
    nvcuda64!???+0 : 7FF8F849861A
    nvcuda64!cuGetErrorName+217
    nvcuda!cuInit+79

  • 196608 ( 196608 - 0) 1 allocs BackTraceC151856E

  •   1 (      1 -      0)	BackTraceC151856E	allocations
    

    ntdll!RtlAllocateHeap+AFD
    nvapi64!nvapi_QueryInterface+370C68
    nvapi64!nvapi_QueryInterface+378531
    nvapi64!???+0 : 7FF98FAC4371
    nvapi64!nvapi_QueryInterface+1B8A
    nvapi64!nvapi_QueryInterface+31418F
    ntdll!RtlActivateActivationContextUnsafeFast+11D
    ntdll!LdrGetProcedureAddressEx+2D7
    ntdll!LdrGetProcedureAddressEx+6A
    ntdll!RtlSwitchedVVI+D07
    ntdll!RtlGetFullPathName_UstrEx+231E
    ntdll!RtlDosPathNameToNtPathName_U+D4
    ntdll!LdrLoadDll+E4
    KERNELBASE!LoadLibraryExW+162
    nvcuda64!cuProfilerStop+B1286
    nvcuda64!cuProfilerStop+1B63BE
    nvcuda64!cuProfilerStop+1B6618
    nvcuda64!cuProfilerStop+1A397E
    nvcuda64!???+0 : 7FF8F84974D5
    nvcuda64!???+0 : 7FF8F84980E2
    nvcuda64!???+0 : 7FF8F849861A
    nvcuda64!cuGetErrorName+217
    nvcuda!cuInit+79

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.