Hi everyone,
I’m currently having a trouble with code I did 6 months ago. It was running fine and obviously no more.
Running this program twice in a row with the same input parameters I manage to have different results. I first thought it could be a memory link. Consequently, I ran it with valgrind (device and deviceemu) and obtained leakS.
I reproduced the error with the simple code below (respectively main.cpp, func.h and func.cu) :
#include <stdio.h>
#include "func.h"
int main(int argc, char **argv)
{
runFunction();
return 1;
}
#ifndef YOUYOU
#define YOUYOU
#include <stdio.h>
extern "C" void runFunction();
#endif
#include "func.h"
extern "C" void runFunction()
{
float4 *variable;
cudaMalloc((void **) &variable, 1000*sizeof(float4) );
cudaMemset(variable, 0, 1000*sizeof(float4) );
cudaFree(variable);
}
I compiled it with
/usr/local/cuda/bin/nvcc -deviceemu -g main.cpp func.cu -o a
and then ran
valgrind --show-reachable=yes --leak-check=full ./a
The result of valgrind is:
==7213== Memcheck, a memory error detector.
==7213== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==7213== Using LibVEX rev 1658, a library for dynamic binary translation.
==7213== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==7213== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==7213== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==7213== For more details, rerun with: -v
==7213==
==7213==
==7213== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 10 from 1)
==7213== malloc/free: in use at exit: 12,023 bytes in 18 blocks.
==7213== malloc/free: 51 allocs, 33 frees, 34,303 bytes allocated.
==7213== For counts of detected errors, rerun with: -v
==7213== searching for pointers to 18 not-freed blocks.
==7213== checked 622,456 bytes.
==7213==
==7213== 20 bytes in 1 blocks are still reachable in loss record 1 of 14
==7213== at 0x4A05809: malloc (vg_replace_malloc.c:149)
==7213== by 0x4F92B9E: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F93701: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F9675D: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)
==7213== by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x400957: runFunction (func.cu:6)
==7213== by 0x40074B: main (main.cpp:7)
==7213==
==7213==
==7213== 32 bytes in 1 blocks are still reachable in loss record 2 of 14
==7213== at 0x4A04B32: calloc (vg_replace_malloc.c:279)
==7213== by 0x3B1C00156A: _dlerror_run (in /lib64/libdl-2.5.so)
==7213== by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)
==7213== by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x400957: runFunction (func.cu:6)
==7213== by 0x40074B: main (main.cpp:7)
==7213==
==7213==
==7213== 40 bytes in 1 blocks are still reachable in loss record 3 of 14
==7213== at 0x4A05809: malloc (vg_replace_malloc.c:149)
==7213== by 0x3B1A00B8E3: _dl_map_object_deps (in /lib64/ld-2.5.so)
==7213== by 0x3B1A010C6C: dl_open_worker (in /lib64/ld-2.5.so)
==7213== by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)
==7213== by 0x3B1A0105FB: _dl_open (in /lib64/ld-2.5.so)
==7213== by 0x3B1C000F99: dlopen_doit (in /lib64/libdl-2.5.so)
==7213== by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)
==7213== by 0x3B1C00150C: _dlerror_run (in /lib64/libdl-2.5.so)
==7213== by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)
==7213== by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213==
==7213==
==7213== 43 bytes in 2 blocks are still reachable in loss record 4 of 14
==7213== at 0x4A05809: malloc (vg_replace_malloc.c:149)
==7213== by 0x3B1A00A035: _dl_new_object (in /lib64/ld-2.5.so)
==7213== by 0x3B1A005ACB: _dl_map_object_from_fd (in /lib64/ld-2.5.so)
==7213== by 0x3B1A007D72: _dl_map_object (in /lib64/ld-2.5.so)
==7213== by 0x3B1A010C0C: dl_open_worker (in /lib64/ld-2.5.so)
==7213== by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)
==7213== by 0x3B1A0105FB: _dl_open (in /lib64/ld-2.5.so)
==7213== by 0x3B1C000F99: dlopen_doit (in /lib64/libdl-2.5.so)
==7213== by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)
==7213== by 0x3B1C00150C: _dlerror_run (in /lib64/libdl-2.5.so)
==7213== by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)
==7213== by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213==
==7213==
==7213== 43 bytes in 2 blocks are still reachable in loss record 5 of 14
==7213== at 0x4A05809: malloc (vg_replace_malloc.c:149)
==7213== by 0x3B1A00576A: open_path (in /lib64/ld-2.5.so)
==7213== by 0x3B1A007F27: _dl_map_object (in /lib64/ld-2.5.so)
==7213== by 0x3B1A010C0C: dl_open_worker (in /lib64/ld-2.5.so)
==7213== by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)
==7213== by 0x3B1A0105FB: _dl_open (in /lib64/ld-2.5.so)
==7213== by 0x3B1C000F99: dlopen_doit (in /lib64/libdl-2.5.so)
==7213== by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)
==7213== by 0x3B1C00150C: _dlerror_run (in /lib64/libdl-2.5.so)
==7213== by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)
==7213== by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213==
==7213==
==7213== 72 bytes in 1 blocks are still reachable in loss record 6 of 14
==7213== at 0x4A05809: malloc (vg_replace_malloc.c:149)
==7213== by 0x4F9D21E: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F966AF: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)
==7213== by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x400957: runFunction (func.cu:6)
==7213== by 0x40074B: main (main.cpp:7)
==7213==
==7213==
==7213== 120 bytes in 1 blocks are still reachable in loss record 7 of 14
==7213== at 0x4A05809: malloc (vg_replace_malloc.c:149)
==7213== by 0x3B1A00BA63: _dl_map_object_deps (in /lib64/ld-2.5.so)
==7213== by 0x3B1A010C6C: dl_open_worker (in /lib64/ld-2.5.so)
==7213== by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)
==7213== by 0x3B1A0105FB: _dl_open (in /lib64/ld-2.5.so)
==7213== by 0x3B1C000F99: dlopen_doit (in /lib64/libdl-2.5.so)
==7213== by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)
==7213== by 0x3B1C00150C: _dlerror_run (in /lib64/libdl-2.5.so)
==7213== by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)
==7213== by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213==
==7213==
==7213== 200 bytes in 1 blocks are still reachable in loss record 8 of 14
==7213== at 0x4A05809: malloc (vg_replace_malloc.c:149)
==7213== by 0x4F9235F: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F93701: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F9675D: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)
==7213== by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x400957: runFunction (func.cu:6)
==7213== by 0x40074B: main (main.cpp:7)
==7213==
==7213==
==7213== 208 bytes in 1 blocks are still reachable in loss record 9 of 14
==7213== at 0x4A05809: malloc (vg_replace_malloc.c:149)
==7213== by 0x4FBE472: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4FBEF96: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F936C9: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F9675D: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)
==7213== by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x400957: runFunction (func.cu:6)
==7213== by 0x40074B: main (main.cpp:7)
==7213==
==7213==
==7213== 208 bytes in 1 blocks are still reachable in loss record 10 of 14
==7213== at 0x4A05809: malloc (vg_replace_malloc.c:149)
==7213== by 0x4FBEB34: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4FBEF80: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F93402: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F9675D: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)
==7213== by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x400957: runFunction (func.cu:6)
==7213== by 0x40074B: main (main.cpp:7)
==7213==
==7213==
==7213== 208 bytes in 1 blocks are still reachable in loss record 11 of 14
==7213== at 0x4A05809: malloc (vg_replace_malloc.c:149)
==7213== by 0x4FBCAD9: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F96708: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)
==7213== by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x400957: runFunction (func.cu:6)
==7213== by 0x40074B: main (main.cpp:7)
==7213==
==7213==
==7213== 312 bytes in 2 blocks are still reachable in loss record 12 of 14
==7213== at 0x4A04B32: calloc (vg_replace_malloc.c:279)
==7213== by 0x3B1A00E7E5: _dl_check_map_versions (in /lib64/ld-2.5.so)
==7213== by 0x3B1A010F08: dl_open_worker (in /lib64/ld-2.5.so)
==7213== by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)
==7213== by 0x3B1A0105FB: _dl_open (in /lib64/ld-2.5.so)
==7213== by 0x3B1C000F99: dlopen_doit (in /lib64/libdl-2.5.so)
==7213== by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)
==7213== by 0x3B1C00150C: _dlerror_run (in /lib64/libdl-2.5.so)
==7213== by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)
==7213== by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213==
==7213==
==7213== 2,325 bytes in 2 blocks are still reachable in loss record 13 of 14
==7213== at 0x4A04B32: calloc (vg_replace_malloc.c:279)
==7213== by 0x3B1A009DCB: _dl_new_object (in /lib64/ld-2.5.so)
==7213== by 0x3B1A005ACB: _dl_map_object_from_fd (in /lib64/ld-2.5.so)
==7213== by 0x3B1A007D72: _dl_map_object (in /lib64/ld-2.5.so)
==7213== by 0x3B1A010C0C: dl_open_worker (in /lib64/ld-2.5.so)
==7213== by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)
==7213== by 0x3B1A0105FB: _dl_open (in /lib64/ld-2.5.so)
==7213== by 0x3B1C000F99: dlopen_doit (in /lib64/libdl-2.5.so)
==7213== by 0x3B1A00CE55: _dl_catch_error (in /lib64/ld-2.5.so)
==7213== by 0x3B1C00150C: _dlerror_run (in /lib64/libdl-2.5.so)
==7213== by 0x3B1C000F10: dlopen@@GLIBC_2.2.5 (in /lib64/libdl-2.5.so)
==7213== by 0x4C12440: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213==
==7213==
==7213== 8,192 bytes in 1 blocks are still reachable in loss record 14 of 14
==7213== at 0x4A05809: malloc (vg_replace_malloc.c:149)
==7213== by 0x4F9D23E: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F966AF: (within /usr/lib64/libcuda.so.177.67)
==7213== by 0x4F8B98F: cuInit (in /usr/lib64/libcuda.so.177.67)
==7213== by 0x4C34B5B: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C39105: (within /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x4C1D093: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.2.0)
==7213== by 0x400957: runFunction (func.cu:6)
==7213== by 0x40074B: main (main.cpp:7)
==7213==
==7213== LEAK SUMMARY:
==7213== definitely lost: 0 bytes in 0 blocks.
==7213== possibly lost: 0 bytes in 0 blocks.
==7213== still reachable: 12,023 bytes in 18 blocks.
==7213== suppressed: 0 bytes in 0 blocks.
I’m convince that the trouble is coming from my installation more than cuda itself (I ran a valgrind test when I implemented that code and it went fine). I updated my cuda driver and toolkit; however, it didn’t fix anything <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />
The result of uname -a is : “Linux 2.6.18-92.1.10.el5 #1 SMP Tue Aug 5 07:42:41 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux” and /etc/redhat-release contains “CentOS release 5.2 (Final)”.
If anyone has a solution or an idea, please feel free to help me External Image
In advance thanks External Image
Cheers
Marc