cuda-memcheck identifies libcuda.so as source of a cudaErrorIllegalAddress error

I’m getting a cudaErrorIllegalAddress error due to an illegal memory access on CUDA API call to cudaFree. cuda-memcheck identifies the cause of the error as the libcuda.so as shown below

=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x32f753]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudart.so.8.0 (cudaFree + 0x186) [0x3de66]
=========     Host Frame:./myProgram [0x12cb76]
=========     Host Frame:./myProgram [0x15f645]
=========     Host Frame:./myProgram [0x191416]
=========     Host Frame:./myProgram [0x191161]
=========     Host Frame:./myProgram [0x45ea9]
=========     Host Frame:./myProgram [0x37523]
=========     Host Frame:./myProgram [0x2cab7]
=========     Host Frame:./myProgram [0x27718]
=========     Host Frame:./myProgram [0x24c42]
=========     Host Frame:./myProgram [0x20a64]
=========     Host Frame:./myProgram [0x1f0bd]
=========     Host Frame:./myProgram [0x1e8c7]
=========     Host Frame:./myProgram [0x1e39a]
=========     Host Frame:./myProgram [0x1c076]
=========     Host Frame:./myProgram [0x1c043]
=========     Host Frame:./myProgram [0x1bfc8]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf1) [0x203f1]
=========     Host Frame:./myProgram [0xbdfa]

The error was thrown after an attempt to free some memory. I have done an error check after my last kernel call, before the cudaFree call that threw the exception, but the check did not detect any errors.

I am at a loss as to how to proceed as the error does not point to any of my kernels but rather libcuda.so. Are such errors typical? What other debugging options should I be looking at in this case. BTW, I am running CUDA 8 (instead of CUDA 9) because I’ve got a Fermi GPU.

I will post a minimal example as as soon as I am able to (the actual code is somewhat complicated).

Update

Thinking that my kernel had accessed invalid memory I decided to try perform an allocate, memset and deallocation after my the kernel like so, hoping to

myKernel<<<1,1>>>();

int* d_test;

status = cudaMalloc( &d_test , 25 * sizeof(int) );
ERROR_CHECK( status )  // OK

status = cudaMemset( d_test , 25 * sizeof(int) , 0 );
ERROR_CHECK( status )	// OK

status = cudaFree( d_test );
ERROR_CHECK( status ) // error

However the error occurs only on the cudaFree call. I’ve got several other cudaFree calls. The first cudaFree call always generates this error.

This call doesn’t look correct to me:

status = cudaMemset( d_test , 25 * sizeof(int) , 0 );

but I don’t think that it has anything to do with what you are reporting.

my guess is that you have an illegal access occurring in a kernel. To test, try adding this just prior to the cudaFree call throwing the error:

status=cudaDeviceSynchronize();
ERROR_CHECK( status )

If the error moves to that call, then you have the tiger by the tail - work backwards, or just do rigorous error checking.

You’re right! I sandwiched my kernel between two cudaDeviceSynchronize() calls like so

status = cudaDeviceSynchronize();
CHECK_ERROR( status )    // OK

myKernel<<<1,1>>>();

status = cudaDeviceSynchronize();
CHECK_ERROR( status )   // NOT OK

Clearly the problem is caused by my kernel; and I’m sort of pleased that’s the case. I’d rather fix my kernel than be stuck with a broken runtime library.

Thanks txBob

Update
Found and fixed the bug in my kernel :)