OpenMP 1D reduction leads to internal compiler error

Hello,

When compiling my program with nvc++ I get the following compiler error
NVC+±F-0000-Internal compiler error. unhandled ilm to find symbol 0 (reproducer.cpp: 22)

This appears to be related to use of an OpenMP reduction with a variable of size 8.
#pragma omp parallel for simd reduction(+:sums)
If I remove the reduction or change it to a scalar reduction then the program compiles. Below is a simple program that reproduces the issue

// Compile with:
//   nvc++ -std=c++20 -mp -O0 -g reproducer.cpp -o repro
//
// The ICE is triggered by `#pragma omp for simd` with an array reduction.
// Changing to `#pragma omp for` (without simd) avoids the crash.

#include <cstdio>
#include <vector>

int main() {
    constexpr int N = 1024;
    constexpr int chunk_size = 8;

    std::vector<double> x(N, 1.0);

    std::vector<std::vector<double>> ys(chunk_size, std::vector<double>(N, 2.0));

    double sums[chunk_size] = {};

    // ICE: omp parallel for simd + array reduction
    #pragma omp parallel for simd reduction(+:sums)
    for (int i = 0; i < N; ++i) {
        for (int k = 0; k < chunk_size; ++k) {
            sums[k] += x[i] * ys[k][i];
        }
    }

    for (int k = 0; k < chunk_size; ++k)
        std::printf("sums[%d] = %f\n", k, sums[k]);

    return 0;
}

I have tried this example with NVC++ 25.9 and 26.1 with the same result

> nvc++ --version

nvc++ 26.1-0 64-bit target on x86-64 Linux -tp znver4 
NVIDIA Compilers and Tools
Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
> nvc++ -std=c++20 -mp -O0 -g reproducer.cpp -o repro
NVC++-F-0000-Internal compiler error. unhandled ilm to find symbol       0  (reproducer.cpp: 22)
NVC++/x86-64 Linux 26.1-0: compilation aborted

Hi joanib14,

Thanks for the report. I’ve filed TPR #38292 and sent it engineering for investigations.

As a workaround, you can use atomics instead. Given array reductions are expensive, I’ve found that atomics are often faster, especially with offload. I’m not sure in this case given the sum array is small and you’re targeting the host, but possible.

    #pragma omp parallel for 
    for (int i = 0; i < N; ++i) {
        for (int k = 0; k < chunk_size; ++k) {
            #pragma omp atomic update
            sums[k] += x[i] * ys[k][i];
        }
    }

-Mat

1 Like