cudaMemGetInfo() how does it work?!?

Hello,

iam currently programming an acceleration structure on the GPU for raytracing.
There is a bug in my code, but i can’t figure out where. The program runs and the bug appears at a non deterministic iteration.
So sometimes my structure is build 500 times without an unspecified launch failure and sometimes just two times with same options set.
I already know that some weird is written into my structure memory, for instance the splitting dimension of a node in the structure is 32012, even though it should be something between 0 - 3 (leaf=3).
Next thing i found is that in the first iteration the used memory of the graphic card is lower then the used memory at the second iteration. After the second iteration it stays constant.
So now my question: How does cudaMemGetInfo() determine how much memory is used?
Would it recognize if i wrote over the bounds of my allocated memory or isn’t that possible?
My idea is that if some function allocates memory then a global int is incremented and if cudaMemGetInfo() is called this global int is used to determine the memory usage ?!?!
Thing is that i don’t allocate new memory during an iteration, so the memory usage shouldn’t increase, but it does…
I am glad for any hint.

regards,
peter

I just add this code in my list

cudaMem.cu

And call checkGpuMem(); when needed

#include <stdio.h>

#include "cuda.h"

extern "C"

void checkGpuMem()

{

float free_m,total_m,used_m;

size_t free_t,total_t;

cudaMemGetInfo(&free_t,&total_t);

free_m =(uint)free_t/1048576.0 ;

total_m=(uint)total_t/1048576.0;

used_m=total_m-free_m;

printf ( "  mem free %d .... %f MB mem total %d....%f MB mem used %f MB\n",free_t,free_m,total_t,total_m,used_m);

}

Contrary to popular belief, cuMemGetInfo() does not actually rely on magic. We ask the kernel mode driver how much memory has been allocated on the card. However, this will not look for out-of-bounds accesses or anything like that; what you want is cuda-memcheck or cuda-gdb (both are rightly considered miracles).

The code posted by jam11 is defective on GPUs with greater than 4GB of memory and should not be used as-is in any CUDA code.