Unexpected leak

Hi everyone,

I’m currently having a trouble with code I did 6 months ago. It was running fine and obviously no more.

Running this program twice in a row with the same input parameters I manage to have different results. I first thought it could be a memory link. Consequently, I ran it with valgrind (device and deviceemu) and obtained leakS.

I reproduced the error with the simple code below (respectively main.cpp, func.h and func.cu) :

#include <stdio.h>

#include "func.h"

int main(int argc, char **argv)

{

	runFunction();

	return 1;

}
#ifndef YOUYOU

#define YOUYOU

#include <stdio.h>

extern "C" void runFunction();

#endif
#include "func.h"

extern "C" void runFunction()

{	

	float4 *variable;

	cudaMalloc((void **) &variable, 1000*sizeof(float4) );

	cudaMemset(variable, 0, 1000*sizeof(float4) );

	cudaFree(variable);

}

I compiled it with

/usr/local/cuda/bin/nvcc -deviceemu -g main.cpp func.cu -o a

and then ran

valgrind --show-reachable=yes --leak-check=full ./a

The result of valgrind is:

==7213== Memcheck, a memory error detector.

==7213== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.

==7213== Using LibVEX rev 1658, a library for dynamic binary translation.

==7213== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.

==7213== Using valgrind-3.2.1, a dynamic binary instrumentation framework.

==7213== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.

==7213== For more details, rerun with: -v

==7213== 

==7213== 

==7213== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 10 from 1)

==7213== malloc/free: in use at exit: 12,023 bytes in 18 blocks.

==7213== malloc/free: 51 allocs, 33 frees, 34,303 bytes allocated.

==7213== For counts of detected errors, rerun with: -v

==7213== searching for pointers to 18 not-freed blocks.

==7213== checked 622,456 bytes.

==7213== 

==7213== 20 bytes in 1 blocks are still reachable in loss record 1 of 14

==7213==    at 0x4A05809: malloc (vg_replace_malloc.c:149)

==7213==    by 0x4F92B9E: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F93701: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F9675D: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x400957: runFunction (func.cu:6)

==7213==    by 0x40074B: main (main.cpp:7)

==7213== 

==7213== 

==7213== 32 bytes in 1 blocks are still reachable in loss record 2 of 14

==7213==    at 0x4A04B32: calloc (vg_replace_malloc.c:279)

==7213==    by 0x3B1C00156A: _dlerror_run (in /lib64/libdl-2.5.so)

==7213==    by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)

==7213==    by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x400957: runFunction (func.cu:6)

==7213==    by 0x40074B: main (main.cpp:7)

==7213== 

==7213== 

==7213== 40 bytes in 1 blocks are still reachable in loss record 3 of 14

==7213==    at 0x4A05809: malloc (vg_replace_malloc.c:149)

==7213==    by 0x3B1A00B8E3: _dl_map_object_deps (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A010C6C: dl_open_worker (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A0105FB: _dl_open (in /lib64/ld-2.5.so)

==7213==    by 0x3B1C000F99: dlopen_doit (in /lib64/libdl-2.5.so)

==7213==    by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)

==7213==    by 0x3B1C00150C: _dlerror_run (in /lib64/libdl-2.5.so)

==7213==    by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)

==7213==    by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213== 

==7213== 

==7213== 43 bytes in 2 blocks are still reachable in loss record 4 of 14

==7213==    at 0x4A05809: malloc (vg_replace_malloc.c:149)

==7213==    by 0x3B1A00A035: _dl_new_object (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A005ACB: _dl_map_object_from_fd (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A007D72: _dl_map_object (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A010C0C: dl_open_worker (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A0105FB: _dl_open (in /lib64/ld-2.5.so)

==7213==    by 0x3B1C000F99: dlopen_doit (in /lib64/libdl-2.5.so)

==7213==    by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)

==7213==    by 0x3B1C00150C: _dlerror_run (in /lib64/libdl-2.5.so)

==7213==    by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)

==7213==    by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213== 

==7213== 

==7213== 43 bytes in 2 blocks are still reachable in loss record 5 of 14

==7213==    at 0x4A05809: malloc (vg_replace_malloc.c:149)

==7213==    by 0x3B1A00576A: open_path (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A007F27: _dl_map_object (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A010C0C: dl_open_worker (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A0105FB: _dl_open (in /lib64/ld-2.5.so)

==7213==    by 0x3B1C000F99: dlopen_doit (in /lib64/libdl-2.5.so)

==7213==    by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)

==7213==    by 0x3B1C00150C: _dlerror_run (in /lib64/libdl-2.5.so)

==7213==    by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)

==7213==    by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213== 

==7213== 

==7213== 72 bytes in 1 blocks are still reachable in loss record 6 of 14

==7213==    at 0x4A05809: malloc (vg_replace_malloc.c:149)

==7213==    by 0x4F9D21E: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F966AF: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x400957: runFunction (func.cu:6)

==7213==    by 0x40074B: main (main.cpp:7)

==7213== 

==7213== 

==7213== 120 bytes in 1 blocks are still reachable in loss record 7 of 14

==7213==    at 0x4A05809: malloc (vg_replace_malloc.c:149)

==7213==    by 0x3B1A00BA63: _dl_map_object_deps (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A010C6C: dl_open_worker (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A0105FB: _dl_open (in /lib64/ld-2.5.so)

==7213==    by 0x3B1C000F99: dlopen_doit (in /lib64/libdl-2.5.so)

==7213==    by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)

==7213==    by 0x3B1C00150C: _dlerror_run (in /lib64/libdl-2.5.so)

==7213==    by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)

==7213==    by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213== 

==7213== 

==7213== 200 bytes in 1 blocks are still reachable in loss record 8 of 14

==7213==    at 0x4A05809: malloc (vg_replace_malloc.c:149)

==7213==    by 0x4F9235F: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F93701: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F9675D: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x400957: runFunction (func.cu:6)

==7213==    by 0x40074B: main (main.cpp:7)

==7213== 

==7213== 

==7213== 208 bytes in 1 blocks are still reachable in loss record 9 of 14

==7213==    at 0x4A05809: malloc (vg_replace_malloc.c:149)

==7213==    by 0x4FBE472: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4FBEF96: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F936C9: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F9675D: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x400957: runFunction (func.cu:6)

==7213==    by 0x40074B: main (main.cpp:7)

==7213== 

==7213== 

==7213== 208 bytes in 1 blocks are still reachable in loss record 10 of 14

==7213==    at 0x4A05809: malloc (vg_replace_malloc.c:149)

==7213==    by 0x4FBEB34: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4FBEF80: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F93402: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F9675D: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x400957: runFunction (func.cu:6)

==7213==    by 0x40074B: main (main.cpp:7)

==7213== 

==7213== 

==7213== 208 bytes in 1 blocks are still reachable in loss record 11 of 14

==7213==    at 0x4A05809: malloc (vg_replace_malloc.c:149)

==7213==    by 0x4FBCAD9: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F96708: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x400957: runFunction (func.cu:6)

==7213==    by 0x40074B: main (main.cpp:7)

==7213== 

==7213== 

==7213== 312 bytes in 2 blocks are still reachable in loss record 12 of 14

==7213==    at 0x4A04B32: calloc (vg_replace_malloc.c:279)

==7213==    by 0x3B1A00E7E5: _dl_check_map_versions (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A010F08: dl_open_worker (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A0105FB: _dl_open (in /lib64/ld-2.5.so)

==7213==    by 0x3B1C000F99: dlopen_doit (in /lib64/libdl-2.5.so)

==7213==    by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)

==7213==    by 0x3B1C00150C: _dlerror_run (in /lib64/libdl-2.5.so)

==7213==    by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)

==7213==    by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213== 

==7213== 

==7213== 2,325 bytes in 2 blocks are still reachable in loss record 13 of 14

==7213==    at 0x4A04B32: calloc (vg_replace_malloc.c:279)

==7213==    by 0x3B1A009DCB: _dl_new_object (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A005ACB: _dl_map_object_from_fd (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A007D72: _dl_map_object (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A010C0C: dl_open_worker (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)

==7213==    by 0x3B1A0105FB: _dl_open (in /lib64/ld-2.5.so)

==7213==    by 0x3B1C000F99: dlopen_doit (in /lib64/libdl-2.5.so)

==7213==    by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)

==7213==    by 0x3B1C00150C: _dlerror_run (in /lib64/libdl-2.5.so)

==7213==    by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)

==7213==    by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213== 

==7213== 

==7213== 8,192 bytes in 1 blocks are still reachable in loss record 14 of 14

==7213==    at 0x4A05809: malloc (vg_replace_malloc.c:149)

==7213==    by 0x4F9D23E: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F966AF: (within /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)

==7213==    by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)

==7213==    by 0x400957: runFunction (func.cu:6)

==7213==    by 0x40074B: main (main.cpp:7)

==7213== 

==7213== LEAK SUMMARY:

==7213==    definitely lost: 0 bytes in 0 blocks.

==7213==      possibly lost: 0 bytes in 0 blocks.

==7213==    still reachable: 12,023 bytes in 18 blocks.

==7213==         suppressed: 0 bytes in 0 blocks.

I’m convince that the trouble is coming from my installation more than cuda itself (I ran a valgrind test when I implemented that code and it went fine). I updated my cuda driver and toolkit; however, it didn’t fix anything <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />

The result of uname -a is : “Linux 2.6.18-92.1.10.el5 #1 SMP Tue Aug 5 07:42:41 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux” and /etc/redhat-release contains “CentOS release 5.2 (Final)”.

If anyone has a solution or an idea, please feel free to help me External Image

In advance thanks External Image

Cheers

Marc

If you’re seeing a memory leak under HOSTEMU, then its unlikely that this is a CUDA bug.

I agree, mainly due to the fact that valgrind wasn’t pointing any error with the same code 6 month ago.
However, when running a “classic” c/c++ equivalent code, everything goes fine. In some way it might be “cuda-related”.
Does anyone had a similar trouble?

If you get different results with the same input, you’re either hitting uninitialized memory or you’ve got a race condition. Leaks with Valgrind don’t necessarily mean anything about the correctness of your code, but illegal accesses certainly do.

Ok, thanks for the answers.

I was wondering, if valgrind is not appropriate to check cuda code, is there something else which could be use ?

Oh, but valgrind is appropriate to check CUDA code. You just have to ignore the false positive results. cudart does some fancy tricks internally in cudaMalloc and especially cudaMallocHost that result in lots of false positive error reports. I run valgrind on my CUDA app with the attached suppression file, and it has helped immensely in finding bugs in kernels that access out of bounds memory.

Brillant !! thanks :)

I’d like to try your suppression file. Can you attach it or send it to me?

I was sure that it was attached before. Let’s try this again.

Edit: I won’t guarantee that this is complete, but it is good enough for all the unit tests in HOOMD to run through valgrind without a peep when compiled in emulation mode (unless I’ve got an out of bounds memory access bug, of course).
cudart.valgrind.supp.txt (1.99 KB)

This is great, thank you!