Error running with Valgrind

Hello,

i’m with a memory problem with my problem. There is something really annoying that sometimes causes an Segment Fault, sometime doesn’t.

Running on Valgrind, i got this:

==4826== Memcheck, a memory error detector.

==4826== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al.

==4826== Using LibVEX rev 1884, a library for dynamic binary translation.

==4826== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP.

==4826== Using valgrind-3.4.1-Debian, a dynamic binary instrumentation framework.

==4826== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al.

==4826== For more details, rerun with: -v

==4826== 

Iniciando programa

==4826== Warning: set address range perms: large range [0x88bc028, 0x108759c0) (undefined)

==4826== Syscall param ioctl(generic) points to uninitialised byte(s)

==4826==	at 0x40007F2: (within /lib/ld-2.9.so)

==4826==	by 0x494ABD2: (within /usr/lib/libcuda.so.195.36.15)

==4826==	by 0x492A2AB: (within /usr/lib/libcuda.so.195.36.15)

==4826==	by 0x48FD528: (within /usr/lib/libcuda.so.195.36.15)

==4826==	by 0x48F58F6: (within /usr/lib/libcuda.so.195.36.15)

==4826==	by 0x499C860: cuCtxCreate (in /usr/lib/libcuda.so.195.36.15)

==4826==	by 0x415D261: (within /usr/local/cuda/lib/libcudart.so.3.0.14)

==4826==	by 0x415DE0B: (within /usr/local/cuda/lib/libcudart.so.3.0.14)

==4826==	by 0x4144D16: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.3.0.14)

==4826==	by 0x804AC86: rk4(float*, float*, float*, int, int, int*, float, float, float, int, int) (in /home/pedro/Dropbox/Unicamp/Iniciação/Processamento em GPUs/Códigos/Fonte/Traçamento de Raios/Multicore/tr-raios-multi-3.0)

==4826==	by 0x8049F0D: main (in /home/pedro/Dropbox/Unicamp/Iniciação/Processamento em GPUs/Códigos/Fonte/Traçamento de Raios/Multicore/tr-raios-multi-3.0)

==4826==  Address 0xbea2f358 is on thread 1's stack

Iniciando simulacao: Feito - no error

Simulacao concluida

Salvando:Feito

==4826== Warning: set address range perms: large range [0x88bc018, 0x108759d0) (noaccess)

==4826== 

==4826== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 38 from 3)

==4826== malloc/free: in use at exit: 340,720 bytes in 695 blocks.

==4826== malloc/free: 4,341 allocs, 3,646 frees, 153,621,184 bytes allocated.

==4826== For counts of detected errors, rerun with: -v

==4826== Use --track-origins=yes to see where uninitialised values come from

==4826== searching for pointers to 695 not-freed blocks.

==4826== checked 651,968 bytes.

==4826== 

==4826== LEAK SUMMARY:

==4826==	definitely lost: 1,341 bytes in 65 blocks.

==4826==	  possibly lost: 280 bytes in 1 blocks.

==4826==	still reachable: 339,099 bytes in 629 blocks.

==4826==		 suppressed: 0 bytes in 0 blocks.

==4826== Rerun with --leak-check=full to see details of leaked memory.

Can someone help me understand what is going on?

Debugging my code, i believe i found the “hole”.

When i reach this:

#define pos 10

#define raios 20

#define TAM_MAX 1e7

...

cudaMalloc((void**)&d_inteta,(raios*sizeof(float)));

  cudaMalloc((void**)&d_inx,(pos*sizeof(float)));

  cudaMalloc((void**)&d_p1,(pos*raios*sizeof(float)));

  cudaMalloc((void**)&d_p2,(pos*raios*sizeof(float)));

  cudaMalloc((void**)&d_out,(TAM_MAX*sizeof(float)));

  cudaMalloc((void**)&d_parada,(pos*raios*sizeof(float)));

Valgrid returns this error:

==8478== Syscall param ioctl(generic) points to uninitialised byte(s)

==8478==	at 0x40007F2: (within /lib/ld-2.9.so)

==8478==	by 0x485BBD2: (within /usr/lib/libcuda.so.195.36.15)

==8478==	by 0x483B2AB: (within /usr/lib/libcuda.so.195.36.15)

==8478==	by 0x480E528: (within /usr/lib/libcuda.so.195.36.15)

==8478==	by 0x48068F6: (within /usr/lib/libcuda.so.195.36.15)

==8478==	by 0x48AD860: cuCtxCreate (in /usr/lib/libcuda.so.195.36.15)

==8478==	by 0x405F261: (within /usr/local/cuda/lib/libcudart.so.3.0.14)

==8478==	by 0x405FE0B: (within /usr/local/cuda/lib/libcudart.so.3.0.14)

==8478==	by 0x4046D16: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.3.0.14)

==8478==	by 0x8059BED: rk4(float*, float*, float*, int, int, int*, float, float, float, float) (in /home/pedro/Dropbox/Unicamp/Iniciação/Processamento em GPUs/Códigos/Fonte/Traçamento de Raios/Multicore/Backup 13 05 10/tr-raios-multi)

==8478==	by 0x8059EE2: main (in /home/pedro/Dropbox/Unicamp/Iniciação/Processamento em GPUs/Códigos/Fonte/Traçamento de Raios/Multicore/Backup 13 05 10/tr-raios-multi)

==8478==  Address 0xbe968388 is on thread 1's stack

Is this some error on cudaMalloc?

Why don’t you add some error checking? Every one of those calls returns a status. You should check it. It could be something benign as running out of memory , or using a floating point constant as a size…

Hello Avidday, thx for you help.

I did it and i get “no error” always. There is a lot of safe routines telling me i have memory to allocate what i need.

As I tried (and apparently failed) to point out in my first reply, this:

#define TAM_MAX 1e7

.....

cudaMalloc((void**)&d_out,(TAM_MAX*sizeof(float)));

looks extremely suspicious. You can’t use floating point values as sizes (in this case you have a double * size_t). I doubt that will work as intended, even if it does, it is horrible programming practice and probably indicative of the other types of dragons which are likely lying dormant in your code. Any one of which is probably causing your problem.

But TAM_MAX value has a integer value (1e7 = 10.000.000) and sizeof(float) too (sizeof(float) = 4). So, TAM_MAX*sizeof(float) is an integer too, isn’t it?

1e7 is a double precision constant.

Oh… I am a mathematician, to me 1e7 is a integer value. I wasn’t thinking as a programmer (int precision goes from -32768 to +32767, right?)

Well, i need to allocate all this memory. Is there any option to me?

Maybe add a “(int)” should fix it?

cudaMalloc((void**)&d_out,((int)(TAM_MAX*sizeof(float))));

The C standard defines a file called limits.h which contains the ranges of integer types which can be used to determine these type limits - whatever platform you are using for cuda, it should be +/- 2^31 for the signed integer and 0-2^32 for the unsigned integer. There is a standard macro called MAX_INT which returns the range of the int type.

If you are familiar with the theory of floating point numbers, you will know that there is no guarantee that 1e7 is an exact representation of 10 000 000, and therefore (int)(1e7*sizeof(float)) is not guaranteed to be equal to 10 000 000 * 4, nor is it guaranteed to be 40 000 000. So no, that is not a good idea either. But, just importantly, 1e7 has an equivalent 64 bit integer value of 4711630319722168320, so without any sort of casting, passing a floating point number to a routine expecting an integer can do some very unexpected things.

Maybe it is time to read a book on C programming.

My program still with this damn bug. I dont have sure about where is the problem.

I had changed TAM_MAX to just 100, so i only ask for 100 bytes on his cudaMalloc, and the bug continues…

==9309== Syscall param ioctl(generic) points to uninitialised byte(s)

==9309==	at 0x4369619: ioctl (syscall-template.S:82)

==9309==	by 0x4955BD2: ??? (in /usr/lib/libcuda.so.195.36.15)

==9309==	by 0x49352AB: ??? (in /usr/lib/libcuda.so.195.36.15)

==9309==	by 0x4908528: ??? (in /usr/lib/libcuda.so.195.36.15)

==9309==	by 0x49008F6: ??? (in /usr/lib/libcuda.so.195.36.15)

==9309==	by 0x49A7860: cuCtxCreate (in /usr/lib/libcuda.so.195.36.15)

==9309==	by 0x415B261: ??? (in /usr/local/cuda/lib/libcudart.so.3.0.14)

==9309==	by 0x415BE0B: ??? (in /usr/local/cuda/lib/libcudart.so.3.0.14)

==9309==	by 0x4142D16: cudaMalloc (in /usr/local/cuda/lib/libcudart.so.3.0.14)

<b>==9309==	by 0x8069B03: rk4(float*, float*, slot*, int, int, int*, float, float, float, int, int) (gpua.cu:81)</b>

==9309==	by 0x8059874: main (main.cu:183)

==9309==  Address 0xbed3d1b8 is on thread 1's stack

==9309==

On gpua.cu:81 i have the first cudaMalloc i use (if i change cudaMalloc’s order it still warning on the first one).

Actually, you may notice that this is a warning due to an ioctl, which certainly means that valgrind is unable to unable to understand how the CUDA driver accesses some device memory, this is not a real error, just a warning due to CUDA not reporting this memory access to valgrind. So far, i’ve seen this warning on every application during the first CUDA call (when the device/driver is initialized). So even a mere cudaFree(0) in a code would give you the same warning. Perhaps nvidia’s driver should get rid of that warning by telling valgrind that this is a valid access …

Cédric