Segmentation fault when compile simple kernel

Code:

#include <cuda_runtime.h>
#include <cstdint>
__global__
void d_kernel(uint32_t *inout){
    uint32_t v = inout[threadIdx.x];
    uint32_t v1 = v;
    for (uint32_t i = 0; i < 10000000; i++) {
        v *= v;
        v1 = v1 + v;
    }
    inout[threadIdx.x] = v + v1;
}

int main() {
    d_kernel<<<1,1>>>(nullptr);
}

Compiler:
nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2023 NVIDIA Corporation

Built on Wed_Nov_22_10:17:15_PST_2023

Cuda compilation tools, release 12.3, V12.3.107

Build cuda_12.3.r12.3/compiler.33567101_0

OS:
Ubuntu 22.04.4 LTS

Full command:
nvcc -o main -arch=sm_86 XX.cu

I suggest:

  1. retest on the latest CUDA (12.5, currently)
  2. if it still fails, file a bug.

This is reported to bug ID 4675651

[Public] Hi xxxx

Thanks for filing a bug ticket . I can initially reproduce this in house . Our compiler engineering team will investigate the issue . We will keep you informed .

Best,
Yuki

[Public] Hi

Credit to our compiler engineering team . The issue is fixed and verified in house . This fix will be part of next second CUDA 12.x release . Thanks again for reporting this to us .

Best,
Yuki

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.