Hi,
I have encountered a bug with nvcc where it compiles C++ code that violates const correctness and should result in a compiler error. Furthermore, the cuda kernel that results is a no-op.
Here’s a reproducer (to build with --extended-lambda --save-temps
):
// Some user defined type
struct s {
int i;
float f;
};
// Pointer wrapper class
template <typename T> struct my_span {
private:
T *__restrict ptr_;
uint32_t size_;
public:
my_span(T *ptr = nullptr, uint32_t size = 0) : ptr_(ptr), size_(size) {}
__host__ __device__ inline T &at(uint32_t x) { return ptr_[x]; }. // non-const accessor
__host__ __device__ inline T at(uint32_t x) const { return ptr_[x]; } // const accessor
};
// CUDA kernel template that takes a __device__ lambda and runs it over some indices
template <typename T> __global__ static void kernel_runner(size_t n, T kernel) {
auto gid = threadIdx.x + blockIdx.x * blockDim.x;
if (gid < n) {
kernel(gid);
}
}
// Function template to dispatch the lambda to the GPU
template <typename T> static void launch_lambda_kernel(size_t N, T &&kernel) {
kernel_runner<<<(N + 1023) / 1024, 1024>>>(N, kernel); //
}
int main() {
auto in = my_span<s>{}; // Compiles if the lambda is mutable or not, but produces null kernel if kernel not mutable (but shouldn't compile first).
launch_lambda_kernel(1000000, [=] __device__(int x) {
in.at(x) = in.at(x + 1);
});
auto in_f = my_span<float>{};
launch_lambda_kernel(1000000, [=] __device__(int x) mutable { // doesn't compile without mutable, as expected
in_f.at(x) = in_f.at(x + 1);
});
return 0;
}
This code in order to be correct requires the lambda function to be marked mutable
because the method used in the assignment .at(x)=
is not const. When the type of the pointer is some fundamental type like float
, nvcc
behaves correctly and refuses to compile the code if mutable
is missing. However, if T
is struct s
, then if compiles and generates an empty kernel. If mutable
is present, the generated code is correct.
nvcc -V
: release 11.5, V11.5.119
Best regards,
Michel