Compiling a Catch2 application with nvcc -std=c++20 leads a crash in cudafe++

Compiling the main() part of a Catch2 application with nvcc in c++20 mode leads to memory exhaustion inside cudafe++, likely due to an infinite loop.

A reproducer can be as simple as test.cu:

#define CATCH_CONFIG_MAIN
#include <catch2/catch.hpp>

Compiling with nvcc in c++17 mode works fine:

/usr/local/cuda-12.1/bin/nvcc -std=c++17 test.cu -c -o test.o

Compiling with nvcc in c++20 seems to hang, and is eventually killed:

/usr/local/cuda-12.1/bin/nvcc -std=c++20 test.cu -c -o test.o
Killed

Investigatinh with nvcc -v -keep shows that the problem is in the cudafe++ step:

/usr/local/cuda-12.1/bin/nvcc -std=c++20 test.cu -c -o test.o -v -keep -keep-dir tmp
...
gcc -std=c++20 -D__CUDA_ARCH_LIST__=520 -E -x c++ -D__CUDACC__ -D__NVCC__  "-I/usr/local/cuda-12.1/bin/../targets/x86_64-linux/include"    -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=1 -D__CUDACC_VER_BUILD__=105 -D__CUDA_API_VER_MAJOR__=12 -D__CUDA_API_VER_MINOR__=1 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "test.cu" -o "tmp/test.cpp4.ii"
cudafe++ --c++20 --gnu_version=110300 --display_error_number --orig_src_file_name "test.cu" --orig_src_path_name "/home/fwyzard/src/nvidia_bug_nnnnnnnn/test.cu" --allow_managed  --m64 --parse_templates --gen_c_file_name "tmp/test.cudafe1.cpp" --stub_file_name "test.cudafe1.stub.c" --gen_module_id_file --module_id_file_name "tmp/test.module_id" "tmp/test.cpp4.ii"
Killed
# --error 0x89 --

The last line of tmp/test.cudafe1.cpp is over 300 MB of repeating std::remove_cv_t< const std::remove_cv_t< const std::remove_cv_t< const ..., which points to some kind of infinite loop inside cudafe++.

1 Like

Submitted as NVIDIA bug #4139863.
For a trivial reproducer, see GitHub - fwyzard/nvidia_bug_4139863: Simple reproducer for NVIDIA bug #4139863 .

We have exactly the same problem. We are using catch and with C++17 it works. With C++20, cudafe++ seems to be stuck in an endless loop. Did you find any workaround by any chance or is there anything new related to this problem? Unfortunately, this prevents us from switching to C++20 right now. :-(

Unfortunately this is still a problem with CUDA 12.2.1 .

The workaround we are using is to move the “main” part defined by

#define CATCH_CONFIG_MAIN
#include <catch2/catch.hpp>

into a .cc file, and implement only the tests in the .cu files:

#include <catch2/catch.hpp>

...

Then, the main part and the tests can be linked together with g++.

In this way nvcc and cudafe++ never see the “main” part.

1 Like