Illegal Memory Access but memcheck and sanitizer return 0 error

Hi, I’m trying the CUDA code on GTX 680 GPU and it return ‘ERROR: an illegal memory access was encountered’. The code runs fine with more recent GPU (2070, 960, etc) I tried. I try running the memcheck and sanitizer and both give 0 error summary. All CUDA is on 10.2 toolkit. Only compute sanitizer runs on 11 toolkit.

Since the code runs fine and the memcheck/sanitizer gives 0 error, I don’t think the problem is in the code. Any idea what may cause this issue?

I checked some of the threads with the same type of error and I try to debug by manually printing everytime it access the memory buffer, but none has given me the error so far unless its on the GTX 680. Is there any other debugging that is more effecient? Other than memcheck/compute-sanitizer since both gives me 0 error.

I assume that the error message “ERROR: an illegal memory access was encountered” has been conclusively and unambiguously linked to an illegal memory access on the GPU.

If cuda-memcheck can’t find anything and instrumentation didn’t bring up any leads, a minimal, self-contained repro code seem necessary to make forward progress on this. Keep in mind that the root cause of the problem may not be in the failing kernel itself, but rather due to some argument passed to the kernel (an incorrect pointer perhaps), or an issue in host code, such as a missing status check on a (failing) cudaMalloc() call.

There have been rare cases where a compiler code generation bug lead to out-of-bounds memory access. A possible hint of that (but not conclusive evidence) would be if the illegal memory access goes away when you lower the ptxas optimization level. The default is -Xptxas -O3. You can lower it one level at a time, e.g. -Xptas -O2, then -Xptxas -O1, finally -Xptxas -O0.