Strange unhanded exception with cuda 5.0 and cublas

My project uses cublas and my own kernels to make a number of Linear Algebra calculations, and for the first time I am getting a mysterious error when I exit main(not before). This is the output;

First-chance exception at 0x000007fef51006c2 in gpu_admm.exe: 0xC0000005: Access violation reading location 0x0000000000000000.
Unhandled exception at 0x000007fef51006c2 in gpu_admm.exe: 0xC0000005: Access violation reading location 0x0000000000000000.
The program ‘[1336] gpu_admm.exe: Native’ has exited with code -1073741819 (0xc0000005).

It seems to be memory related, and visual studio takes me to the host_runtime.h file with the cursor pointing to this area of code;

static void **__cudaFatCubinHandle;

static void __cdecl __cudaUnregisterBinaryUtil(void)
{
__cudaUnregisterFatBinary(__cudaFatCubinHandle);
}

I am using Visual Studio 2010 x64(Windows 7), with the most recent version of CUDA and CUBLAS. The is the first strange error I have seen, and I check error codes for all device-host memory operations and all cublas operations through the run, which do not produce any errors. After the message appears the console gets stuck and I have to close Visual Studio.

Any ideas of what may be going on?

I have narrowed this down to a line which copies(cudaMemcpy) a device array to a host array within a host function. That line itself does not generate a cudaError_t error at that time, but results in the above error upon exiting main(). If I comment that line out I do not get the same error.

I still would like to get an idea of what the problem may be, so I can avoid this issue in the future. Why would it give me error issues upon leaving main, rather than an error code at the time? I am assuming there is some type of memory leak, but the nature of this error is confusing.

Check the size of the copy, as well as the pointers passed to it. If the size is too large, you may be overwriting important data structures on the host side, which then leads to an access violation later on (the actual fatal error appears to be the de-referencing of a null pointer).

In general it is a good idea to use memory checkers. For device code there is cuda-memcheck, for host code there is valgrind on Linux. No idea what the equivalent of valgrind for Windows would be.

njuffa,

Ok that makes sense. I do have a (host) memory checker which I will use.

Thanks!