A simple addition with the min value of integer type (e.g., LONG_MIN) fails for long int or long long int type when computed in an OpenACC compute region.
If the example below is compiled with NVHPC V23.11, the following compilation error occurs for long and long long int types:
NVC+±S-0155-Compiler failed to translate accelerator region (see -Minfo messages): mm_add failure
What is wrong with this example?
#include <openacc.h>
#include <stdio.h>
#include <climits>
#ifndef TEST
#define TEST 1
#endif
#if TEST == 1
#define INTTYPE int
#define MIN_VALUE INT_MIN
#elif TEST == 2
#define INTTYPE long
#define MIN_VALUE LONG_MIN
#elif TEST == 3
#define INTTYPE long long
#define MIN_VALUE LLONG_MIN
#endif
int main() {
#if TEST == 1
printf("==> Test int type with INT_MIN (%d)!\n", MIN_VALUE);
#elif TEST == 2
printf("==> Test long int type with LONG_MIN (%ld)!\n", MIN_VALUE);
#elif TEST == 3
printf("==> Test long long int type with LLONG_MIN (%lld)!\n", MIN_VALUE);
#endif
INTTYPE A[32];
#pragma acc parallel loop copyout(A)
for(int i=0; i<32; i++) {
A[i] = i + MIN_VALUE;
//A[i] = i - 1; //This works well
}
#if TEST == 1
printf("A[0] = %d\n", A[0]);
#elif TEST == 2
printf("A[0] = %ld\n", A[0]);
#elif TEST == 3
printf("A[0] = %lld\n", A[0]);
#endif
return 0;
}
I’ve not seen this error before so not sure what’s wrong. Hence I’ve opened a problem report, TPR#35258, and sent it to engineering for investigation.
My guess is that it’s some type of underflow when translating the LONG_MIN or LLONG_MIN, both are " (-9223372036854775807L - 1L)" in the kernel itself. As a workaround, assigning LONG_MIN to a variable and then using this variable in the region, seems to work correctly:
INTTYPE A[32];
INTTYPE val = MIN_VALUE;
#pragma acc parallel loop copyout(A[:32])
for(INTTYPE i=0; i<32; i++) {
A[i] = val+i;
//A[i] = i - 1; //This works well
}
% nvc++ -acc test2.cpp -Minfo=accel -DTEST=3
main:
29, Generating copyout(A[:]) [if not already present]
Generating NVIDIA GPU code
31, #pragma acc loop gang, vector(32) /* blockIdx.x threadIdx.x */
31, Generating implicit firstprivate(val)
% a.out
==> Test long long int type with LLONG_MIN (-9223372036854775808)!
A[0] = -9223372036854775808
@Mat I verified that the new code works correctly. However, in my actual program, LONG_MIN is used in a deeply nested device function; a simplified code will look like this: