OpenMP 1D reduction leads to internal compiler error

joanib14 · February 27, 2026, 2:37pm

Hello,

When compiling my program with nvc++ I get the following compiler error
NVC+±F-0000-Internal compiler error. unhandled ilm to find symbol 0 (reproducer.cpp: 22)

This appears to be related to use of an OpenMP reduction with a variable of size 8.
#pragma omp parallel for simd reduction(+:sums)
If I remove the reduction or change it to a scalar reduction then the program compiles. Below is a simple program that reproduces the issue

// Compile with:
//   nvc++ -std=c++20 -mp -O0 -g reproducer.cpp -o repro
//
// The ICE is triggered by `#pragma omp for simd` with an array reduction.
// Changing to `#pragma omp for` (without simd) avoids the crash.

#include <cstdio>
#include <vector>

int main() {
    constexpr int N = 1024;
    constexpr int chunk_size = 8;

    std::vector<double> x(N, 1.0);

    std::vector<std::vector<double>> ys(chunk_size, std::vector<double>(N, 2.0));

    double sums[chunk_size] = {};

    // ICE: omp parallel for simd + array reduction
    #pragma omp parallel for simd reduction(+:sums)
    for (int i = 0; i < N; ++i) {
        for (int k = 0; k < chunk_size; ++k) {
            sums[k] += x[i] * ys[k][i];
        }
    }

    for (int k = 0; k < chunk_size; ++k)
        std::printf("sums[%d] = %f\n", k, sums[k]);

    return 0;
}

I have tried this example with NVC++ 25.9 and 26.1 with the same result

> nvc++ --version

nvc++ 26.1-0 64-bit target on x86-64 Linux -tp znver4 
NVIDIA Compilers and Tools
Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
> nvc++ -std=c++20 -mp -O0 -g reproducer.cpp -o repro
NVC++-F-0000-Internal compiler error. unhandled ilm to find symbol       0  (reproducer.cpp: 22)
NVC++/x86-64 Linux 26.1-0: compilation aborted

MatColgrove · February 27, 2026, 4:17pm

Hi joanib14,

Thanks for the report. I’ve filed TPR #38292 and sent it engineering for investigations.

As a workaround, you can use atomics instead. Given array reductions are expensive, I’ve found that atomics are often faster, especially with offload. I’m not sure in this case given the sum array is small and you’re targeting the host, but possible.

    #pragma omp parallel for 
    for (int i = 0; i < N; ++i) {
        for (int k = 0; k < chunk_size; ++k) {
            #pragma omp atomic update
            sums[k] += x[i] * ys[k][i];
        }
    }

-Mat

Topic		Replies	Views
Target reduction in parallel region : Internal compiler error. unexpected ILM number operands nvc, nvc++ and nvfortran	1	73	February 25, 2025
Nvfortran: Internal compiler error in openmp reduction nvc, nvc++ and nvfortran	2	422	November 17, 2023
NVC++-F-0000-Internal compiler error. unhandled size for preparing max constant nvc, nvc++ and nvfortran	1	638	July 25, 2022
Nvc++ OpenMP error inside llc nvc, nvc++ and nvfortran	5	1225	June 1, 2021
Reductions on pointer type must have bounds specified nvc, nvc++ and nvfortran	3	786	September 14, 2022
Out of range error with openmp gpu offload nvc, nvc++ and nvfortran	10	1156	February 1, 2023
OpenMP user-defined reductions nvc, nvc++ and nvfortran	2	658	July 17, 2023
[OpenMP][nvc++] "Duplicate name in reduction clause" error with recent SDK nvc, nvc++ and nvfortran	5	613	December 6, 2023
Bounds error with OpenMP using array reduction in nvhpc 21.11 nvc, nvc++ and nvfortran	2	682	January 7, 2022
Fortran OpenACC array reduction nvc, nvc++ and nvfortran	8	1018	September 29, 2025

OpenMP 1D reduction leads to internal compiler error

Related topics