Nvcc silently accepts host-only C++ standard library code

nvcc accepts invalid device code when a constexpr function containing host-only C++ standard library types (e.g. std::string) is instantiated and called from a CUDA kernel, particularly when using --std=c++20 --expt-relaxed-constexpr. The code compiles without diagnostics even though such constructs are not supported in device code. This can lead to silent miscompilation or undefined behavior at runtime.

A reproducer can be as simple as:

#include <cstdio>
#include <string>

template <int I>
constexpr void dump() {
	std::string data = std::to_string(I);
	printf("%s\n", data.data());
}

template <int I>
__global__ void kernel() {
	dump<I>();
}

int main() {
	printf("Before\n");
	cudaDeviceSynchronize();
	kernel<42><<<1, 1>>>();
	cudaDeviceSynchronize();
	printf("After\n");
}

Compiling with:

/usr/local/cuda-13.0/bin/nvcc -ccbin /opt/rh/gcc-toolset-14/root/usr/bin/gcc -std=c++20 --expt-relaxed-constexpr test.cu -o test

Investigation indicates additionally that the behavior depends on the host compiler. For example, using nvcc v13.0.88 with gcc 11.5.0 produces an expected compilation error, whereas switching to gcc 14.2.1 suppresses the error and allows the invalid code to compile.

An an example nvcc 12.9.1 with gcc (GCC) 11.5.0 leads to the correct error message

<source>(17): error: calling a __host__ function("void  ::dump<(int)42> ()") from a __global__ function("kernel<(int)42> ") is not allowed

<source>(17): error: identifier "dump<(int)42> " is undefined in device code

Submitted as NVIDIA bug #5840342.
For a trivial reproducer, see GitHub - Electricks94/nvidia_bug_5840342