I’m running into some odd (and incorrect) behavior from cudaFree() when called from a different thread as cudaMalloc().
I’m using an 8800 GTX on a win32 machine, and my CUDA code is compiled into a DLL which I call into from java. When I create certain java objects, that causes a call into CUDA that allocates memory via cudaMalloc(). When the java garbage collector cleans the object up, if makes a CUDA call that frees the memory.
My allocation and freeing routines work fine when I call them one after the other in the same thread. When using cudaMalloc() from one thread and cudaFree() from the garbage collector (which runs in a different thread), the free does not appear to be effective (that is, I soon run out of memory as if I hadn’t freed anything), and I get a segfault when the program completes execution (perhaps when java unloads the DLL, but I’m not sure).
I don’t see this problem with cudaMallocArray() and cudaFreeArray().