copyin behavior change in 12.5?

Hi, I upgraded to 12.5 today and found that 3/4 of the codes I’ve been working with written with PGI ACC directives would no longer compile. I’m not sure where in the chain this happened, I think the previous version in use was 12.3, either way, the error was this.

make -C kmeans/
make[1]: Entering directory `/home/njustn/omp-co-repo/kmeans'
pgcc -mp=allcores  -O3 -fast -Minfo=accel,mp  -DPGI -I/opt/pgi/linux86-64/2012/cuda/4.1/include -I/opt/pgi/linux86-64/2012/include_acc -ta=nvidia,keepgpu,keepptx,nofma -c99   -I. -I../common -c omp_main.c -o omp_main.o
main:
     68, Parallel region activated
     78, Parallel region terminated
pgcc -mp=allcores  -O3 -fast -Minfo=accel,mp  -DPGI -I/opt/pgi/linux86-64/2012/cuda/4.1/include -I/opt/pgi/linux86-64/2012/include_acc -ta=nvidia,keepgpu,keepptx,nofma -c99   -I. -I../common  -c omp_kmeans.c 
omp_kmeans.001.gpu(43): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(43): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(43): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(59): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(59): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(59): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(75): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(75): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(75): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(76): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(76): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(91): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(91): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(91): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(91): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(91): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(105): error: expression must have integral or enum type

17 errors detected in the compilation of "/tmp/pgnvd_wob6Wbeavuz.nv0".
PGC-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (omp_kmeans.c: 210)
omp_kmeans:
    148, Parallel region activated
         Parallel loop activated with static block schedule
    153, Barrier
         Parallel region terminated
    164, Generating copyin(fo[0:1])
    185, Parallel region activated
    210, Generating copyin(fc[0:numClusters*numCoords])
         Generating copyin(gte)
         Generating copyin(gts)
         Generating copyin(numClusters)
         Generating copyin(numObjs)
         Generating copyin(numCoords)
    225, Loop is parallelizable
         Accelerator kernel generated
        225, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
    238, Loop is parallelizable
    244, Loop carried scalar dependence for 'min_dist' at line 255
         Scalar last value needed after loop for 'index' at line 260
    249, Loop is parallelizable
    271, Parallel region terminated

After some confusion, I realized that the compiler had decided that all of my basic int variables were actually char * arrays for some reason. These are the relevant chunks of code.

float** omp_kmeans(int     is_perform_atomic, /* in: */                                                                                                        
                   float **objects,           /* in: [numObjs][numCoords] */                                                                                   
                   int     numCoords,         /* no. coordinates */                                                                                            
                   int     numObjs,           /* no. objects */                                                                                                
                   int     numClusters,       /* no. clusters */                                                                                               
                   float   threshold,         /* % objects change membership */                                                                                
                   int    *membership)        /* out: [numObjs] */ 
...
#pragma acc region for \                                                                                                                                       
                    deviceptr(data)\                                                                                                                           
                    deviceptr(cfo)\                                                                                                                            
                    private(index,i,j,k,dist,min_dist)\                                                                                                        
                    copyin(numCoords,numObjs,numClusters,gts,gte)\                                                                                             
                    copyin(fc[0:numClusters*numCoords])

The important point is the numCoords, numObjs, etc. int variables. When the copyin clause is removed, the code compiles successfully in 12.5, where both compile successfully with 11.10 (the other one I have immediate access to). Is this intentional behavior?

Hi njustn,

Is this intentional behavior?

Yes, and no. In order to support C pointer manipulation, our engineers completely revamped how pointers operate on the device. Hence the generic pointers, “char *”, are expected, however the “expression must have arithmetic or enum type” was not. While we’d need to see your code to be sure, this message is very similar to another report (TPR#18694) which will be fixed in the next release.

If you can, please send a reproducing example of the error to PGI Customer Support (trs@pgroup.com) and ask then to forward it to me. I’ll then confirm if its the same issue.

Thanks,
Mat

If you would like a complete application, I’ll be happy to send one of the ones that failed along, but the build situation for my full applications is somewhat complicated at the moment, so it might be more trouble than it’s worth for you. A minimal reproducing example is quite simple though, and copied here. The issue appears to be that pre-12.5 straight non-pointer values were identified and treated accordingly, now they are blindly treated as “char *” values in the generated cuda output, causing mathematical expressions to fail loudly.

int main(int argc, char * argv[]){
    int i=0, j=0;
#pragma acc region for copyin(argc)
    for(i=0; i<500000; i++){
        argv[i][0] *= argc;
    }
    return 0;
}

Yep, this is the same problem. So assuming that this is identical your full program, It will be fixed in 12.6.

  • Mat
% pgcc -acc njustin.c -V12.5 -Minfo=accel -Msafeptr
/tmp/pgaccbcigdvJW3WzI.gpu(20): error: expression must have arithmetic or enum type

1 error detected in the compilation of "/tmp/pgnvdOdig4-pDmu03.nv0".
PGC-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (jtin.c: 4)
main:
      4, Generating copyin(argc)
         Generating copy(argv[0:500000][0:1])
      5, Loop is parallelizable
         Accelerator kernel generated
          5, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
PGC/x86-64 Linux 12.5-0: compilation completed with warnings
% pgcc -acc njustin.c -V12.6 -Minfo=accel -Msafeptr
main:
      4, Generating copyin(argc)
         Generating copy(argv[0:500000][0:1])
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
      5, Loop is parallelizable
         Accelerator kernel generated
          5, #pragma acc for parallel /* blockIdx.x */
             CC 1.0 : 9 registers; 32 shared, 0 constant, 0 local memory bytes
             CC 2.0 : 12 registers; 0 shared, 48 constant, 0 local memory bytes
%

That’s great news, thanks. I feel a great deal more comfortable when I have all of my variable copies explicitly listed out rather than trusting the automatic copies to behave as I expect.