We’ve got a Windows-based CUDA application using CUDA SDK 11.1 Update 1 that performs some number crunching. One configuration of our application is reporting a cudaErrorIllegalAddress during a cudaStreamSynchronize() after downloading some data from device to the host. Nothing helpful in our logging: this doesn’t happen in our debug builds (which has much more extensive error-checking and so runs significantly slower).
Added some extra stream-sync calls prior to the download call to our release build, and confirm the illegal-address error appears to be occurring at (one of) the many kernels prior to the download & stream-sync calls. Doing a stream-sync after each kernel significantly slows things down again, which prevents the original cudaErrorIllegalAddress from happening.
OK, so fired up compute-sanitizer with the memcheck option “-tool memcheck <our_binary>” to try to trap the underlying kernel execution or API call. Now the weird part: compute-sanitizer doesn’t catch the error when it occurs. But we do still get that error from the runtime/API and our application logs it.
Our logging shows a cudaStreamSynchronize() returning the same cudaErrorIllegaAddress prior to the download call (cudaMemcpyAsync with device->host).
But compute-sanitizer doesn’t show any errors – not a kernel causing issues, nor a specific API with bad arguments. Recompiled our kernels with line-info in case that effects the tools ability to show “where” in the kernel the error was occurring, but same thing: compute-sanitizer doesn’t show any errors.
What am I missing? compute-sanitizer is definitely doing something, as performance drops through the floor when running our application. But I’m struggling to even come up with an idea of how the CUDA runtime would catch this error and return it to us for logging…but compute-sanitizer wouldn’t catch it prior to that?