Incorrect result when using the OpenACC min/max reducer over arrays containing only +/-Inf with nvhpc-24.5

The following code gives inconsistent result depending on whether we execute it over GPU or CPU:

#include <iostream>
#include <cmath>

int main(){
  const int N = 10;
  
  float array[N];
  for (int i=0; i<N; ++i) {
    array[i] = HUGE_VALF; // <= infinity
  }

  float min = HUGE_VALF;

#pragma acc parallel loop vector reduction(min:min) copyin(array)
    for (auto i = 0; i < N; i++) {
      if (array[i] < min) min = array[i];
     }

    std::cout << min << std::endl;
}

This (correctly) returns inf when executed over CPU but returns 3.40282e+38 (MAX_FLOAT) when executed over GPU.
The bug is also present when swapping min with max and HUGE_VALF with -HUGE_VALF and when using doubles instead of floats.

This was tested over versions 23.7 and 24.5 on Ampere and Turing architecture. I wasn’t able to test with the 25.7 version of the SDK or newer cuda arch, I hope it wasn’t already corrected.

The issue here is that the OpenACC standard defines the initial value for the private copy of min reduction variable is set to the largest value of that type, i.e. MAX_FLOAT in this case.

See section 2.5.15: https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.4.pdf

I’ve submitted an RFE, TPR #37611, to see what, if anything, we can do to help with this case.

-Mat

Hello Mat, thanks you for your answer and the pointer to the OpenACC standard.
I see your point but I will raise two counterarguments:

  • as I see it, the largest representable value for floats is HUGE_VALF and not MAX_FLOAT,
  • it doesn’t explain why the program behaves differently depending on where it is executed.

Anyways, thanks again for your answer and thank you for submitting a bug report.

Fair point, and engineering may change the initial partial reduction value to HUGE_VALF, though I don’t know what other considerations are needed.

For multicore, reductions are much simpler so no initial value is needed.