In my project I encountered a strange issue where, after a call to a function returns that is implemented in a shared library, a call to
cudaGetLastError() will return
cudaErrorMissingConfiguration, while when called at the end of the function in the shared library, it returns no error. In both cases, I call
cudaDeviceSynchronize() before to ensure that all asynchronous work is waited for. I am also pretty sure that there is no C++ class destructor executing any other code at the end of the shared library function, and the program is single-threaded.
The above behavior seems to me like it should never happen, thus I am suspecting it to be caused by a bug in CUDA. I created a (relatively) small test program to reproduce the issue, which is attached to this post: cuda-bug-sample-program.tar.gz (5.6 MB). Unfortunately, I had to include loguru with the program as the bug did not occur without it, but as far as I can tell, loguru’s code is executed only after the bug occurs.
To give a short description of what the program does:
- The executable
testapp's main function is implemented in
main.cu. It allocates two pieces of CUDA memory and then calls
LibFunction()is implemented in
lib.cu, which gets compiled into the shared library
testlib. It executes
cub::DeviceScan::ExclusiveSum()on the two pieces of CUDA memory, checks for errors, and returns. The error checks in this function do not report anything.
- After having called
main()performs error checking itself. Here, suddenly the error
cudaErrorMissingConfigurationis returned by the
CUDA_CHECK_NO_ERROR()macro (which internally calls
Here is how you can further convince yourself that this appears to be some kind of compiler bug:
- The file
unrelated.cucontains a function
ThisFunctionIsNotCalledAnywhere(), which, true to its name, is not called anywhere. This file is compiled into the
unrelated.cuis commented out or removed from
CMakeLists.txt, then the program suddenly runs successfully, even though it shouldn’t affect its execution at all. (The function
ThisFunctionIsNotCalledAnywhere()contains a call to
cub::DeviceScan::ExclusiveSum()- I suspect that this in some weird way interferes with the call to the same function from the shared library).
- If the file
cuda_util.his edited, replacing the two instances of loguru’s
LOG(FATAL)in this file with
std::cout, then the program also finishes without errors, even though these lines should only be executed if an error has already occurred.
- The issue happens only if
-DCMAKE_BUILD_TYPE=RelWithDebInfois used in the CMake call to configure the program. With
Release, it does not occur.
- valgrind does not show any error, and compute-sanitizer only shows the cudaErrorMissingConfiguration error that is also found by the program’s error checks.
Given that the issue seemingly only happens in very specific circumstances, I hope that somebody will still be able to reproduce this. I included the build folder including the compiled program in the attached archive for the case it helps anything.
In the specific circumstances where the issue occurs, it seems to happen very consistently however. With my full project code, I also tried downgrading CUDA to 11.5.2 and 11.4.4, and the issue kept happening. Changing between GCC 9 and GCC 11 also did not make the issue go away.
My environment is:
- System: Ubuntu 20.04 (with
gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0)
- GPU: Geforce RTX 3090
- NVIDIA Driver Version: 510.47.03
- CUDA version (output of
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Thu_Feb_10_18:23:41_PST_2022 Cuda compilation tools, release 11.6, V11.6.112 Build cuda_11.6.r11.6/compiler.30978841_0
I built and ran the provided sample program as follows:
mkdir build cd build # Using CMake version 3.22.3 installed via snap: # sudo snap install cmake --classic /snap/cmake/current/bin/cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_CUDA_ARCHITECTURES="86" .. make ./testapp
Am I overlooking anything substantial here or is that actually some kind of compiler bug related to CUDA?
Thanks in advance for any help.