NVCC silently compiles std::swap to incorrect code (with no error or warning) in certain scenarios

leif5 · February 14, 2025, 11:02pm

Steps to reproduce:

Launch an A100 on Lambda
Install latest cuda and run this reproduction

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-8
/usr/local/cuda-12.8/bin/nvcc -std=c++20 --expt-relaxed-constexpr -arch=sm_80 -o repro repro.cu && ./repro

// Compile with:
// nvcc -std=c++20 --expt-relaxed-constexpr -arch=sm_80 -o repro repro.cu && ./repro 

// If you remove `-std=c++20`, you get the correct error message:
// error: identifier "std::swap<    ::MyStruct> " is undefined in device code

// If you remove `--expt-relaxed-constexpr`, you get a misleading error message:
//  error: calling a constexpr __host__ function("swap") from a __global__ function("reproKernel") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

#include <stdio.h>
#include <cuda/std/type_traits>

struct Foo {
	int32_t foo;

	__host__ __device__ Foo(const Foo& other) : foo(other.foo) {}
	// ^ comment out this line, and the swap succeeds (?!?!?!?!)

	__host__ __device__ constexpr Foo() : foo(1337){}
	// remove this constexpr ^ to get the correct error message:
	// error: identifier "std::swap<    ::MyStruct> " is undefined in device code
};

struct MyStruct {
	Foo foo; // <-- comment out this line, and the swap succeeds (?!?!?!?!)
	int32_t bar;
};

__global__ void reproKernel() {
	MyStruct A{.bar = 123};
	MyStruct B{.bar = 456};

	printf("Before swap %d %d (expect: 123 456)\n", A.bar, B.bar);
	std::swap(A, B);
	printf("After swap %d %d (expect: 456 123)\n", A.bar, B.bar);

#if 0
	::cuda::std::swap(A, B);
	printf("After second swap %d %d (expect 123 456) (note that enabling this #if **made the first swap also succeed** (?!?!?!))\n", A.bar, B.bar);
#endif
}

int main() {
	reproKernel<<<1, 1>>>();
	cudaDeviceSynchronize();
}

Robert_Crovella · February 15, 2025, 12:09am

you may wish to file a bug.

leif5 · February 15, 2025, 6:23am

I have filed Log in | NVIDIA Developer

Topic		Replies	Views
NVCC Compiles incorrect code when using lambdas (and optimises away the kernel) CUDA NVCC Compiler nvcc	0	689	January 24, 2022
NVCC bug CUDA NVCC Compiler cuda	0	753	April 7, 2022
Internal compiler error with nvcc 11.3 nvc, nvc++ and nvfortran	1	877	April 28, 2021
<input>(0): Error: Signal Segmentation fault in phase Global Optimization -- processing aborte CUDA Programming and Performance	10	3672	July 2, 2009
Nvc++ & external CUDA-thrust conflicts for -stdpar offload nvc, nvc++ and nvfortran	5	445	December 12, 2022
CUDA version not available message with nvc++ on Ubuntu nvc, nvc++ and nvfortran	11	7607	April 30, 2021
'cicc' compilation error and debug flag CUDA Programming and Performance	25	14214	May 23, 2023
Switch oddities Compiler bug? CUDA Programming and Performance	16	4549	September 10, 2008
nvcc bug: Variable template arithmetics in class scope triggers nvcc internal error CUDA Programming and Performance	7	634	October 9, 2018
Compilation broken sign-change-detection code CUDA Programming and Performance	4	4895	January 28, 2011

NVCC silently compiles std::swap to incorrect code (with no error or warning) in certain scenarios

Related topics