CUDA Runtime Error Only With Memcheck

Hi, I have unit tests that run a bunch of CUDA code, including cuFFT and cuBLAS. If I run the unit test in a loop without cuda-memcheck, it passes every time. However, if I added cuda-memcheck, on the second iteration I start to see a swath of errors:

CUDA Runtime Error: an illegal instruction was encountered
cuRand Runtime Error
cudaMalloc Error: an illegal instruction was encountered

The line numbers it gives seems to have no relation to the error. For example, the runtime error is on a line that simply does cudaDeviceSynchronize. Is this expected?

its certainly possible. Whether it is expected or not could only be answered with a test case.

As one possible example, if you overrun an allocated array by, let’s say, 1 element, its possible for the runtime error checking system to completely miss this. Only if you run your code under cuda-memcheck will this type of error be consistently caught. Once cuda-memcheck flags an error, this reporting will show up in subsequent runtime API error checks as well. That could be on a cudaDeviceSynchronize() call, for example. The asynchronous nature of kernel-detected errors (whether the runtime itself can catch them, or it takes cuda-memcheck to observe them) means that they are reported on CUDA runtime API calls after the kernel completes.

I wouldn’t immediately put “illegal instruction” in that category, but again, I wouldn’t reach any conclusions on my own without a test case.

Hi Robert, based on my previous usage of cuda-memcheck, I thought the tool itself would have reported the error in a different format where it reports that memory was overwritten. In this case it’s a CUDA runtime error instead. Is that something you’d expect?

cuda-memcheck can detect many, but not all, errors. Some errors are influenced by the environment a program runs in, which can influence the exact location of memory allocations, for example, or the content of uninitialized memory. It is certainly possible that your program has a latent bug that does not manifest itself when you run without cuda-memcheck, but does manifest itself when you run with cuda-memcheck.

Hi Robert and njuffa, this turned out to be a bug in cuda-memcheck that we found a while ago, but manifesting itself differently. We saw an issue on Volta cards where we would have to create an FFT plan before calling cublasCreate, or we would see runtime errors. We confirmed with nvidia last year it was a bug that was being fixed. Since this new code is quite complicated, I didn’t see that we had the same issue since it was buried.