Reductions on pointer type must have bounds specified

user159840 · September 14, 2022, 1:10am

I am using hpc sdk v22.7, cudatoolkit 11.7, trying to compile some C/C++ code to do some openmp target offloading onto my Geforce GTX 1650. I want to do reduction on an array. I am not using CUDA unified memory. But I keep getting compiler error

NVC++-F-0155-Reductions on pointer type must have bounds specified:  heap_sum_arr

My simple program is something like this:

#define N 1000

double *heap_sum_arr = new double[N];
//initialize heap_sum_arr here

#pragma omp target data map(tofrom:heap_sum_arr[0:N]) 
	{
        #pragma omp target teams loop reduction(+:heap_sum_arr[0:N])
		for (int i = 0; i < N; i++) {
			heap_sum_arr[i] += <some computations>
		}
	}


delete [] heap_sum;

The same thing happens with a malloc’ed C pointer. It seems to me I’ve specified the bounds quite well in compliance with openMP 5.0 standards.

It appears that openMP target offload array reduction only works with variables allocated on the stack, i.e. something declared with double arr[N];

This severely limits the sizes of reduction arrays I can work with.

Why is this so? Is this intended behavior? Is there a workaround?

PS: compile command I use is:
nvc++ -mp=gpu -gpu=cc75,cuda11.7 src.c

MatColgrove · September 14, 2022, 6:11pm

Why is this so? Is this intended behavior?

The code looks correct to me, so it’s likely a compiler issue. I filed a problem report, TPR #32404, and have asked our engineers to investigate.

Is there a workaround?

In this particular example, the reduction isn’t needed given each loop iteration is accumulating a unique index. Hence the work around is to simply remove the reduction clause.

If you did have a code that has collisions, then the work around would be to use atomics. Also, array reductions can have a high overhead, especially for larger arrays, so atomics may be the better option anyway.

For example:

% cat test.cpp
#include<iostream>
#include<cstdlib>
#include<omp.h>
#define N 1000

using namespace std;

int main(){
double *hist = new double[10];
for (int i = 0; i < 10; i++) {
    hist[i] = 0;
}
#pragma omp target teams loop map(tofrom:hist[:10])
        for (int i = 0; i < N; i++) {
                int idx = i%10;
#pragma omp atomic update
                hist[idx]+=1;
        }
for (int i = 0; i < 10; i++) {
   cout << i << ":" << hist[i] << endl;
}
delete [] hist;

}
% nvc++ test.cpp -mp=gpu; a.out
0:100
1:100
2:100
3:100
4:100
5:100
6:100
7:100
8:100
9:100

Thanks for the report,
Mat

user159840 · September 14, 2022, 6:32pm

Thank you, Mat, for reporting this issue. In the example I gave, I used indices with no collision for simplicities sake – my actual use case has collisions.

I don’t think atomics are the solution for me, because

I am working with double complex numbers, and I don’t think atomic update of complex numbers is currently supported (please correct me if I’m wrong ). When I try to prepend an accumulation statment to a double complex array with #pragma omp atomic update, compilation fails with NVC++-S-1073-Procedures called in a OpenMP target region must have 'omp declare target' information - __atomic_load This does not occur when the array type is double, rather than double complex.
I might be able to find a workaround by accumulating the real and complex parts of my calculations separately, but this adds complexity to my code, and I have also heard that pegging the same address repeatedly with atomic statements have a performance penalty. This might end up being slower than doing reduction with a larger array. Please correct me if I’m wrong on this as well.

Thanks!

MatColgrove · September 14, 2022, 7:06pm

Yes, I believe you’re correct that there’s no support for atomics on double complex (at least not that I’m aware). I do see that we have an open RFE for this in OpenACC (which could be applied to OpenMP as well), but I don’t have details on when/if engineering will implement it.

Yes, atomic can also incur a performance penalty, but this is largely dependent on the timing of when the threads access the same memory. If they all hit at the same time, then yes, it can be a problem. But if the accesses are staggered, then the overhead can be quite low. No idea what it would be in your case.

Another possibility is to switch to using OpenACC, since at least for this example, we’re able to compile and run the code correctly.

% cat test1.cpp
#include<iostream>
#define N 1000

using namespace std;

int main(){

double *heap_sum_arr = new double[N];
#pragma acc data copy(heap_sum_arr[:N])
{
#pragma acc parallel loop reduction(+:heap_sum_arr[0:N])
        for (int i = 0; i < N; i++) {
                heap_sum_arr[i] += 1;
        }
}

cout << heap_sum_arr[1] << endl;
delete [] heap_sum_arr;

}
% nvc++ -acc test1.cpp -Minfo=accel -V22.7; a.out
main:
     10, Generating copy(heap_sum_arr[:1000]) [if not already present]
         Generating NVIDIA GPU code
         12, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
             Generating reduction(+:heap_sum_arr[:1000])
     10, Local memory used for heap_sum_arr
1

system · September 28, 2022, 7:07pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Out of range error with openmp gpu offload nvc, nvc++ and nvfortran	10	993	February 1, 2023
Nvc++ OpenMP error inside llc nvc, nvc++ and nvfortran	5	1123	June 1, 2021
Significant deterioration of performance with array reduction in OpenACC Legacy PGI Compilers	7	1029	April 22, 2022
[Help] Using reduction with Array Legacy PGI Compilers	14	3192	March 21, 2024
NVC++-F-0155-Reductions on pointer type must have bounds specified nvc, nvc++ and nvfortran	3	345	May 18, 2023
CUDA_ERROR_ILLEGAL_ADDRESS with OpenMP "distribute parallel for" nvc, nvc++ and nvfortran	2	238	May 15, 2024
What is the issue of different values between running the code in serial and run it using OpenACC? Legacy PGI Compilers	15	1525	December 4, 2020
OpenACC "FATAL ERROR: data in PRESENT clause was not found on device" in basic matrix class nvc, nvc++ and nvfortran gpu	5	2114	July 29, 2021
ATOMIC operations on pointers arrays (OpenACC Fortran) nvc, nvc++ and nvfortran	3	899	December 17, 2021
Allocatable arrays made private for openmp leads to out of bounds (0x0) writes with nvfortran nvc, nvc++ and nvfortran	17	434	June 21, 2024

Reductions on pointer type must have bounds specified

Related topics