Incorrect result when using the OpenACC min/max reducer over arrays containing only +/-Inf with nvhpc-24.5

paulgannay · July 24, 2025, 1:24pm

The following code gives inconsistent result depending on whether we execute it over GPU or CPU:

#include <iostream>
#include <cmath>

int main(){
  const int N = 10;
  
  float array[N];
  for (int i=0; i<N; ++i) {
    array[i] = HUGE_VALF; // <= infinity
  }

  float min = HUGE_VALF;

#pragma acc parallel loop vector reduction(min:min) copyin(array)
    for (auto i = 0; i < N; i++) {
      if (array[i] < min) min = array[i];
     }

    std::cout << min << std::endl;
}

This (correctly) returns inf when executed over CPU but returns 3.40282e+38 (MAX_FLOAT) when executed over GPU.
The bug is also present when swapping min with max and HUGE_VALF with -HUGE_VALF and when using doubles instead of floats.

This was tested over versions 23.7 and 24.5 on Ampere and Turing architecture. I wasn’t able to test with the 25.7 version of the SDK or newer cuda arch, I hope it wasn’t already corrected.

MatColgrove · July 24, 2025, 5:35pm

The issue here is that the OpenACC standard defines the initial value for the private copy of min reduction variable is set to the largest value of that type, i.e. MAX_FLOAT in this case.

See section 2.5.15: https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.4.pdf

I’ve submitted an RFE, TPR #37611, to see what, if anything, we can do to help with this case.

-Mat

paulgannay · July 25, 2025, 6:40am

Hello Mat, thanks you for your answer and the pointer to the OpenACC standard.
I see your point but I will raise two counterarguments:

as I see it, the largest representable value for floats is HUGE_VALF and not MAX_FLOAT,
it doesn’t explain why the program behaves differently depending on where it is executed.

Anyways, thanks again for your answer and thank you for submitting a bug report.

MatColgrove · July 25, 2025, 6:06pm

Fair point, and engineering may change the initial partial reduction value to HUGE_VALF, though I don’t know what other considerations are needed.

For multicore, reductions are much simpler so no initial value is needed.

Topic		Replies	Views
OpenACC min reduction Legacy PGI Compilers	2	3456	July 9, 2015
Understanding max and min in OpenACC reduction nvc, nvc++ and nvfortran	1	1813	January 22, 2021
Why the timestep = 0 in my do concurrent reduce min code? nvc, nvc++ and nvfortran	9	45	March 10, 2025
Reduction clause Legacy PGI Compilers	2	1935	April 23, 2013
atomic min/max for real data Legacy PGI Compilers	8	19969	February 18, 2010
Reduction variables take wrong values inside a loop nvc, nvc++ and nvfortran	4	501	February 7, 2023
Problem with comparing numbers non-real values make it impossible to find min and max CUDA Programming and Performance	5	1424	July 31, 2009
thrust::minmax_element on GPU produces different results than on CPU GPU-Accelerated Libraries	2	996	January 18, 2018
Reduction sample in SDK CUDA Programming and Performance	4	3635	February 23, 2010
Out of range error with openmp gpu offload nvc, nvc++ and nvfortran	10	1015	February 1, 2023

Incorrect result when using the OpenACC min/max reducer over arrays containing only +/-Inf with nvhpc-24.5

Related topics