NVCC Compiles incorrect code when using lambdas (and optimises away the kernel)

Hi,
I have encountered a bug with nvcc where it compiles C++ code that violates const correctness and should result in a compiler error. Furthermore, the cuda kernel that results is a no-op.

Here’s a reproducer (to build with --extended-lambda --save-temps):

// Some user defined type
struct s {
  int i;
  float f;
};

// Pointer wrapper class
template <typename T> struct my_span {
private:
  T *__restrict ptr_;
  uint32_t size_;

public:
  my_span(T *ptr = nullptr, uint32_t size = 0) : ptr_(ptr), size_(size) {}
  __host__ __device__ inline T &at(uint32_t x) { return ptr_[x]; }.     // non-const accessor
  __host__ __device__ inline T at(uint32_t x) const { return ptr_[x]; } // const accessor
};

// CUDA kernel template that takes a __device__ lambda and runs it over some indices
template <typename T> __global__ static void kernel_runner(size_t n, T kernel) {
  auto gid = threadIdx.x + blockIdx.x * blockDim.x;
  if (gid < n) {
    kernel(gid);
  }
}

// Function template to dispatch the lambda to the GPU
template <typename T> static void launch_lambda_kernel(size_t N, T &&kernel) {
  kernel_runner<<<(N + 1023) / 1024, 1024>>>(N, kernel); // 
}

int main() {

  auto in = my_span<s>{}; // Compiles if the lambda is mutable or not, but produces null kernel if kernel not mutable (but shouldn't compile first).
  launch_lambda_kernel(1000000, [=] __device__(int x)  { 
      in.at(x) = in.at(x + 1); 
   });
   


  auto in_f = my_span<float>{};
  launch_lambda_kernel(1000000, [=] __device__(int x) mutable  {  // doesn't compile without mutable, as expected
      in_f.at(x) = in_f.at(x + 1); 
   });


  return 0;
}

This code in order to be correct requires the lambda function to be marked mutable because the method used in the assignment .at(x)= is not const. When the type of the pointer is some fundamental type like float, nvcc behaves correctly and refuses to compile the code if mutable is missing. However, if T is struct s, then if compiles and generates an empty kernel. If mutable is present, the generated code is correct.

nvcc -V: release 11.5, V11.5.119

Best regards,
Michel