Why a simple additon with a long (or long long) type min value fails in an OpenACC compute region?

Seyong · February 16, 2024, 7:45pm

A simple addition with the min value of integer type (e.g., LONG_MIN) fails for long int or long long int type when computed in an OpenACC compute region.

If the example below is compiled with NVHPC V23.11, the following compilation error occurs for long and long long int types:

NVC+±S-0155-Compiler failed to translate accelerator region (see -Minfo messages): mm_add failure

What is wrong with this example?

#include <openacc.h>
#include <stdio.h>
#include <climits>

#ifndef TEST
#define TEST 1
#endif

#if TEST == 1
#define INTTYPE int
#define MIN_VALUE INT_MIN
#elif TEST == 2
#define INTTYPE long 
#define MIN_VALUE LONG_MIN
#elif TEST == 3
#define INTTYPE long long
#define MIN_VALUE LLONG_MIN 
#endif

int main() {
#if TEST == 1
    printf("==> Test int type with INT_MIN (%d)!\n", MIN_VALUE);
#elif TEST == 2
    printf("==> Test long int type with LONG_MIN (%ld)!\n", MIN_VALUE);
#elif TEST == 3
    printf("==> Test long long int type with LLONG_MIN (%lld)!\n", MIN_VALUE);
#endif
    INTTYPE A[32];
    #pragma acc parallel loop copyout(A)
    for(int i=0; i<32; i++) {
        A[i] = i + MIN_VALUE;   
        //A[i] = i - 1; //This works well
    }   
#if TEST == 1
    printf("A[0] = %d\n", A[0]);
#elif TEST == 2
    printf("A[0] = %ld\n", A[0]);
#elif TEST == 3
    printf("A[0] = %lld\n", A[0]);
#endif
    return 0;
}

MatColgrove · February 20, 2024, 10:18pm

Hi Seyong,

I’ve not seen this error before so not sure what’s wrong. Hence I’ve opened a problem report, TPR#35258, and sent it to engineering for investigation.

My guess is that it’s some type of underflow when translating the LONG_MIN or LLONG_MIN, both are " (-9223372036854775807L - 1L)" in the kernel itself. As a workaround, assigning LONG_MIN to a variable and then using this variable in the region, seems to work correctly:

    INTTYPE A[32];
    INTTYPE val = MIN_VALUE;
    #pragma acc parallel loop copyout(A[:32])
    for(INTTYPE i=0; i<32; i++) {
        A[i] = val+i;
        //A[i] = i - 1; //This works well
    }

% nvc++ -acc test2.cpp -Minfo=accel -DTEST=3
main:
     29, Generating copyout(A[:]) [if not already present]
         Generating NVIDIA GPU code
         31, #pragma acc loop gang, vector(32) /* blockIdx.x threadIdx.x */
     31, Generating implicit firstprivate(val)
% a.out
==> Test long long int type with LLONG_MIN (-9223372036854775808)!
A[0] = -9223372036854775808

-Mat

Seyong · February 21, 2024, 4:32am

@Mat I verified that the new code works correctly. However, in my actual program, LONG_MIN is used in a deeply nested device function; a simplified code will look like this:

#pragma acc routine seq
void intAdd(INTTYPE *A, int i) {
    INTTYPE minVal = MIN_VALUE;
    A[i] = i + minVal;  
}
int main() {
    INTTYPE A[32];
    #pragma acc parallel loop copyout(A)
    for(int i=0; i<32; i++) {
        intAdd(A, i);
    }
    return 0;
}

The above code fails with the same compile-time error as the original code.
So, I had to use the following version to make it compilable:

INTTYPE minVal = MIN_VALUE;
#pragma acc declare copyin(minVal)

#pragma acc routine seq
void intAdd(INTTYPE *A, int i) {
    A[i] = i + minVal;  
}
int main() {
    INTTYPE A[32];
    #pragma acc parallel loop copyout(A)
    for(int i=0; i<32; i++) {
        intAdd(A, i);
    }
    return 0;
}

Is there another way to handle this issue (LONG_MIN is used in a deeply nested device function)?

MatColgrove · February 21, 2024, 5:20pm

Using the global variable is probably the best work around for now.

Seyong · February 21, 2024, 5:26pm

Thank you for confirming this.
I hope that this seemingly simple bug will be fixed in the next release of NVHPC.

MatColgrove · March 13, 2024, 5:13pm

Hi Seyong,

Engineering let me know that TPR #35258 has been fixed in our 24.3 release.

-Mat