Using CUBLAS in a shared library (or a memory leak in CUBLAS?)

ryanjparker · April 25, 2013, 5:11am

I am writing a library that uses CUBLAS to fit statistical models. Currently, I have a routine that calls cublasCreate() and cublasDestroy() for working with CUBLAS. In some cases, this routine needs to be called repeatedly with varying inputs for, as an example, a simulation study.

What I’ve noticed, even when I simply use cublasCreate/Destroy(), the GPU has a small <1MB amount that is never deleted. In fact, unless I call cudaDeviceReset(), there is ~40MB allocated and never freed during each call to the cublasCreate/Destroy() pair.

I assume that this probably isn’t a memory leak, but instead I am not using CUBLAS as intended. With that said, does it make sense to call cublasCreate/Destroy() as I am doing, or should I be doing something different with my single execution thread?

Thanks for any help!

ryanjparker · April 25, 2013, 5:29am

Here is some valgrind output (note that none is in use at exit when I do not use cublas)

==11437== HEAP SUMMARY:
==11437==     in use at exit: 44,798 bytes in 51 blocks
==11437==   total heap usage: 98,961 allocs, 98,910 frees, 60,601,533 bytes allocated

==11437== 16 bytes in 1 blocks are definitely lost in loss record 1 of 45
==11437==    at 0x4A06FC7: operator new(unsigned long) (vg_replace_malloc.c:261)
==11437==    by 0x9212072: ??? (in /usr/local/cuda-5.0/lib64/libcudart.so.5.0.35)
==11437==    by 0x923E478: ??? (in /usr/local/cuda-5.0/lib64/libcudart.so.5.0.35)
==11437==    by 0x91FDD6F: ??? (in /usr/local/cuda-5.0/lib64/libcudart.so.5.0.35)
==11437==    by 0x3A7D00F77D: _dl_fini (in /lib64/ld-2.14.90.so)
==11437==    by 0x3A7D439930: __run_exit_handlers (in /lib64/libc-2.14.90.so)
==11437==    by 0x3A7D4399B4: exit (in /lib64/libc-2.14.90.so)
==11437==    by 0x3A7D4216A3: (below main) (in /lib64/libc-2.14.90.so)

...

==11437== LEAK SUMMARY:
==11437==    definitely lost: 16 bytes in 1 blocks
==11437==    indirectly lost: 0 bytes in 0 blocks
==11437==      possibly lost: 1,496 bytes in 11 blocks
==11437==    still reachable: 43,286 bytes in 39 blocks
==11437==         suppressed: 0 bytes in 0 blocks

Olivier_Roy · December 6, 2013, 8:58am

Ryan,

What CUDA library version are you using?

I am asking because I noticed a similar issue with CUDA 4.2 whenever a cudaGetDevice() call is done before the first cudaSetDevice() call. I also got 16 bytes reported as ‘definitely lost’ by valgrind (v. 3.8.1). The problem seems to be solved with CUDA 5.5, but some bytes remain ‘possibly lost’. If a cudaSetDevice() is done before the first cudaGetDevice(), no leak is reported. I suspect that the 16 bytes that are reported by valgrind in your example come from that problem, not from cublasCreate(). However, cublasCreate() seems to generate a number of ‘possibly lost’ bytes (I tested with both CUDA 4.2 and CUDA 5.5).

More generally, with CUDA 5.5, it appears that the first CUDA call (e.g., cudaGetDeviceCount(), cudaSetDevice(), cudaDeviceReset()) always generates a number of ‘possibly lost’ bytes. Not sure if this is valgrind issue, or some initialization bug in the CUDA library.

Olivier

Topic		Replies	Views
leak in cublasCreate + cublasDestroy? GPU-Accelerated Libraries	4	2246	October 25, 2017
cublas problem CUDA Programming and Performance	1	2161	February 10, 2012
Valgrind leak reported on first CUDA call / cublasCreate CUDA Programming and Performance	4	1786	December 6, 2013
CUBLAS failing on cublasAlloc from C CUDA Programming and Performance	2	3950	February 27, 2007
Memory Leak CUDA Setup and Installation	0	635	December 6, 2016
cublasAlloc fails even though there is enough memory CUDA Programming and Performance	4	10953	December 15, 2009
Doubts about CUBLAS CUDA Programming and Performance	3	3506	February 18, 2009
Is there a memory leak in CUDA CUDA Programming and Performance	6	7224	June 11, 2008
CUBLAS memory usage CUDA Programming and Performance	0	1078	April 29, 2009
Limit on cublasAlloc? CUDA Programming and Performance	16	10837	October 2, 2010

Using CUBLAS in a shared library (or a memory leak in CUBLAS?)

Related topics