error when compiling HPL with openacc

Sorry for my poor English,but I really need some help.

The error part is below
I checked the Makefile,and I am pretty sure the syntax were right.
pls tell me if I should set some env or give it some parameter.

pgcc -acc -Minfo -o HPL_gpukernel.o -c -DAdd__ -DF77_INTEGER=int -DStringSunStyle -I/home/hpl/hpl-2.0-openacc/include -I/home/hpl/hpl-2.0-openacc/include/CUDA -I /usr/local/cuda/include -I/usr/local/cuda/include …/HPL_gpukernel.c
PGC-S-0035-Syntax error: Recovery attempted by replacing ‘,’ by ‘;’ (…/HPL_gpukernel.c: 672)
PGC-S-0035-Syntax error: Recovery attempted by replacing ‘;’ by ‘!=’ (…/HPL_gpukernel.c: 672)
PGC-S-0035-Syntax error: Recovery attempted by replacing ‘,’ by ‘;’ (…/HPL_gpukernel.c: 677)
PGC-S-0035-Syntax error: Recovery attempted by replacing ‘;’ by ‘!=’ (…/HPL_gpukernel.c: 677)
PGC-S-0035-Syntax error: Recovery attempted by replacing ‘,’ by ‘;’ (…/HPL_gpukernel.c: 681)
PGC-S-0035-Syntax error: Recovery attempted by replacing ‘;’ by ‘!=’ (…/HPL_gpukernel.c: 681)
PGC/x86-64 Linux 18.10-1: compilation completed with severe errors
make[2]: *** [HPL_gpukernel.o] Error 2
make[2]: Leaving directory /home/hpl/hpl-2.0-openacc/src/auxil/CUDA' make[1]: *** [build_src] Error 2 make[1]: Leaving directory /home/hpl/hpl-2.0-openacc’
make: *** [build] Error 2


the Makefile form 671-685


#pragma acc kernels loop
for( j = 0, jbj = 0, jcj = 0; j < N; j++, jbj += LDB, jcj += LDC )
{
Cjcj = C+jcj;
DSCAL( M, BETA, Cjcj, 1 );
#pragma acc loop
for( l = 0, jal = 0, iblj = jbj; l < K; l++, jal += LDA, iblj += 1 )
{
t0 = ALPHA * B[iblj];
#pragma acc loop
for( i = 0, iail = jal, icij = jcj; i < M; i++, iail += 1, icij += 1 )
{ C[icij] += A[iail] * t0; }
}
}
}

thx

Do the errors go away if you have just one variable in the for loop? for instance

for (j=0; j<N; j++)

And handle the other variables in the body of the loop?

Thanks for suggestion,however it didn’t work.
I tried to write a simple loop code like yours ,but the problem still there.
I think the problem may from a wrong include file?

I’ve seen an issue like this where our compiler doesn’t seem to properly handle complex for loops, like so:

for( j = 0, jbj = 0, jcj = 0; j < N; j++, jbj += LDB, jcj += LDC )

If you re-write it like the following, are you saying that you’re still getting the same syntax errors? Are the syntax errors still occurring if you remove -acc?

j = 0;
jbj = 0;
jcj = 0;
#pragma acc ...
for (; j < N; j++) {
...
jbj += LDB;
jcj += LDC;
}

If you’re able to isolate a simple snippet that still demonstrates the issue, feel free to post it here. There’s another issue on the forum (Syntax error on for-loop structure with multiple variable in - #2 by aglobus1 - Legacy PGI Compilers - NVIDIA Developer Forums) that is somewhat similar to the error you’re getting here. I filed a TPR for it, but if you have an example I can add it as a +1

Thanks for your suggestion.
If I take away -acc , the error didn’t exist.
But in this way ,HPL program didn’t accelerate.

And here is the code I fixed

{
   register double            t0;
   int                        i, iail, iblj, icij, j, jal, jbj, jcj, l;
   double                     * Cjcj;
   jbj = 0;
   jcj  = 0;
   #pragma acc kernels loop
   for( j = 0 ; j < N; j++ )
   {
      jbj += LDB;
      jcj += LDC;
      Cjcj = C+jcj;
      DSCAL( M, BETA, Cjcj, 1 );
      jal = 0;
      iblj = jbj;
      #pragma acc loop
      for( l = 0 ; l < K; l++ )
      {
         jal += LDA;
         iblj += 1;
         t0 = ALPHA * B[iblj];
         iail = jal;
         icij = jcj;
         #pragma acc loop
         for( i = 0 ; i < M; i++ )
         {
            iail += 1;
            icij += 1;
            C[icij] += A[iail] * t0;
         }
      }
   }
}

the new problem

pgcc -o HPL_gpukernel.o -c  -DHPL_CALL_CBLAS  -I/home/ncku/z_acc/hpl-2.0-openacc/include -I/home/ncku/z_acc/hpl-2.0-openacc/include/cuda  -I /home/ncku/mpich/include -acc -fast -Minfo  ../HPL_gpukernel.c
HPL_accdgemm:
    912, HPL_accdgemm0 inlined, size=153 (inline) file ../HPL_gpukernel.c (832)
         842, Memory zero idiom, loop replaced by call to __c_mzero8
         851, Invariant if transformation
              Loop not fused: no successor loop
              Generated an alternate version of the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              Generated 1 prefetches in scalar loop
              FMA (fused multiply-add) instruction(s) generated
         856, Memory zero idiom, loop replaced by call to __c_mzero8
              Loop not fused: different loop trip count
              Loop not fused: function call before adjacent loop
              Loop not vectorized: data dependency
              Generated an alternate version of the loop
              Loop not vectorized: data dependency
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Loop not vectorized: data dependency
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Loop unrolled 8 times
              Generated 2 prefetches in scalar loop
              Generated 1 prefetches in scalar loop
              FMA (fused multiply-add) instruction(s) generated
         858, Invariant if transformation
              Loop not fused: no successor loop
              Generated an alternate version of the loop
              Generated vector simd code for the loop containing reductions
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop containing reductions
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop containing reductions
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop containing reductions
              Generated a prefetch instruction for the loop
              Generated 1 prefetches in scalar loop
              FMA (fused multiply-add) instruction(s) generated
    917, HPL_accdgemm0 inlined, size=153 (inline) file ../HPL_gpukernel.c (832)
         842, Memory zero idiom, loop replaced by call to __c_mzero8
         851, Invariant if transformation
              Loop not fused: no successor loop
              Generated an alternate version of the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              Generated 1 prefetches in scalar loop
              FMA (fused multiply-add) instruction(s) generated
         856, Memory zero idiom, loop replaced by call to __c_mzero8
              Loop not fused: different loop trip count
              Loop not fused: function call before adjacent loop
              Loop not vectorized: data dependency
              Generated an alternate version of the loop
              Loop not vectorized: data dependency
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Loop not vectorized: data dependency
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Loop unrolled 8 times
              Generated 2 prefetches in scalar loop
              Generated 1 prefetches in scalar loop
              FMA (fused multiply-add) instruction(s) generated
         858, Invariant if transformation
              Loop not fused: no successor loop
              Generated an alternate version of the loop
              Generated vector simd code for the loop containing reductions
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop containing reductions
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop containing reductions
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop containing reductions
              Generated a prefetch instruction for the loop
              Generated 1 prefetches in scalar loop
              FMA (fused multiply-add) instruction(s) generated
HPL_accdtrsm0:
    453, Memory zero idiom, loop replaced by call to __c_mzero8
    464, HPL_accdtrsmLUNN inlined, size=25 (inline) file ../HPL_gpukernel.c (123)
         126, Loop not fused: no successor loop
         128, Loop not fused: dependence chain to sibling loop
              Generated an alternate version of the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
         133, Loop not vectorized: data dependency
              Loop unrolled 4 times
              FMA (fused multiply-add) instruction(s) generated
    465, HPL_accdtrsmLUNU inlined, size=22 (inline) file ../HPL_gpukernel.c (150)
         153, Loop not fused: no successor loop
         155, Loop not fused: dependence chain to sibling loop
              Generated an alternate version of the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
         159, Loop not vectorized: data dependency
              Loop unrolled 4 times
              FMA (fused multiply-add) instruction(s) generated
    470, HPL_accdtrsmLUTN inlined, size=20 (inline) file ../HPL_gpukernel.c (176)
         180, Loop not fused: no successor loop
         185, Generated an alternate version of the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              FMA (fused multiply-add) instruction(s) generated
    471, HPL_accdtrsmLUTU inlined, size=18 (inline) file ../HPL_gpukernel.c (203)
         207, Loop not fused: no successor loop
         209, Generated 1 prefetches in scalar loop
         212, Generated an alternate version of the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              FMA (fused multiply-add) instruction(s) generated
    479, HPL_accdtrsmLLNN inlined, size=25 (inline) file ../HPL_gpukernel.c (15)
          18, Loop not fused: no successor loop
          20, Loop not fused: dependence chain to sibling loop
              Generated an alternate version of the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
          24, Loop not vectorized: data dependency
              Loop unrolled 4 times
              FMA (fused multiply-add) instruction(s) generated
    480, HPL_accdtrsmLLNU inlined, size=22 (inline) file ../HPL_gpukernel.c (41)
          44, Loop not fused: no successor loop
          46, Loop not fused: dependence chain to sibling loop
              Generated an alternate version of the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
          49, Loop not vectorized: data dependency
              Loop unrolled 4 times
              FMA (fused multiply-add) instruction(s) generated
    485, HPL_accdtrsmLLTN inlined, size=23 (inline) file ../HPL_gpukernel.c (66)
          70, Loop not fused: no successor loop
          76, Generated an alternate version of the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              FMA (fused multiply-add) instruction(s) generated
    486, HPL_accdtrsmLLTU inlined, size=21 (inline) file ../HPL_gpukernel.c (95)
          99, Loop not fused: no successor loop
         105, Generated an alternate version of the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              Generated vector simd code for the loop containing reductions
              Generated 2 prefetch instructions for the loop
              FMA (fused multiply-add) instruction(s) generated
    497, HPL_accdtrsmRUNN inlined, size=27 (inline) file ../HPL_gpukernel.c (337)
         340, Loop not fused: no successor loop
         342, Loop not fused: different loop trip count
              Generated an alternate version of the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
         343, Loop not fused: different loop trip count
         345, Loop not vectorized: data dependency
              Loop unrolled 4 times
              FMA (fused multiply-add) instruction(s) generated
         348, Loop not vectorized: data dependency
              Loop unrolled 8 times
              Generated 1 prefetches in scalar loop
    498, HPL_accdtrsmRUNU inlined, size=22 (inline) file ../HPL_gpukernel.c (362)
         365, Loop not fused: no successor loop
         367, Loop not fused: different loop trip count
              Generated an alternate version of the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
         370, Loop not vectorized: data dependency
              Loop unrolled 4 times
              FMA (fused multiply-add) instruction(s) generated
    503, HPL_dtrsmRUTN inlined, size=30 (inline) file ../HPL_gpukernel.c (386)
         390, Loop not fused: no successor loop
         393, Loop not fused: different loop trip count
              Loop not vectorized: data dependency
              Loop unrolled 8 times
              Generated 1 prefetches in scalar loop
         394, Loop not fused: different loop trip count
         397, Loop not vectorized: data dependency
              Loop unrolled 8 times
              Generated 2 prefetches in scalar loop
              FMA (fused multiply-add) instruction(s) generated
         400, Generated an alternate version of the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
    504, HPL_accdtrsmRUTU inlined, size=24 (inline) file ../HPL_gpukernel.c (414)
         418, Loop not fused: no successor loop
         421, Loop not fused: different loop trip count
         424, Loop not vectorized: data dependency
              Loop unrolled 8 times
              Generated 2 prefetches in scalar loop
              FMA (fused multiply-add) instruction(s) generated
         427, Generated an alternate version of the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
    512, HPL_accdtrsmRLNN inlined, size=31 (inline) file ../HPL_gpukernel.c (229)
         232, Loop not fused: no successor loop
         235, Loop not fused: different loop trip count
              Generated an alternate version of the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
         236, Loop not fused: different loop trip count
         239, Loop not vectorized: data dependency
              Loop unrolled 4 times
              FMA (fused multiply-add) instruction(s) generated
         242, Loop not vectorized: data dependency
              Loop unrolled 8 times
              Generated 1 prefetches in scalar loop
    513, HPL_accdtrsmRLNU inlined, size=25 (inline) file ../HPL_gpukernel.c (256)
         259, Loop not fused: no successor loop
         262, Loop not fused: different loop trip count
              Generated an alternate version of the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
         266, Loop not vectorized: data dependency
              Loop unrolled 4 times
              FMA (fused multiply-add) instruction(s) generated
    518, HPL_accdtrsmRLTN inlined, size=30 (inline) file ../HPL_gpukernel.c (282)
         286, Loop not fused: no successor loop
         288, Loop not fused: different loop trip count
              Loop not vectorized: data dependency
              Loop unrolled 8 times
              Generated 1 prefetches in scalar loop
         289, Loop not fused: different loop trip count
         293, Loop not vectorized: data dependency
              Loop unrolled 8 times
              Generated 2 prefetches in scalar loop
              FMA (fused multiply-add) instruction(s) generated
         296, Generated an alternate version of the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
    519, HPL_accdtrsmRLTU inlined, size=24 (inline) file ../HPL_gpukernel.c (310)
         316, Loop not fused: different loop trip count
         320, Loop not vectorized: data dependency
              Loop unrolled 8 times
              Generated 2 prefetches in scalar loop
              FMA (fused multiply-add) instruction(s) generated
         323, Generated an alternate version of the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
              Generated vector simd code for the loop
              Generated a prefetch instruction for the loop
PGC-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Could not find allocated-variable index for symbol - C (../HPL_gpukernel.c: 673)
HPL_accdgemmNN:
    674, Complex loop carried dependence of A->,C->,B-> prevents parallelization
         Accelerator kernel generated
         Generating Tesla code
        674, #pragma acc loop seq
        679, #pragma acc loop seq
        683, #pragma acc loop seq
        691, #pragma acc loop seq
    674, Complex loop carried dependence of A->,C->,B-> prevents parallelization
    679, Scalar last value needed after loop for Cjcj-> at line 679
         Loop carried scalar dependence for Cjcj at line 679
         Scalar last value needed after loop for Cjcj at line 679
    683, Accelerator restriction: size of the GPU copy of B is unknown
         Complex loop carried dependence of B->,A->,C-> prevents parallelization
    691, Accelerator restriction: size of the GPU copy of C,A is unknown
         Complex loop carried dependence of A->,C-> prevents parallelization
         Parallelization requires privatization of C-> as well as last value
PGC-F-0704-Compilation aborted due to previous errors. (../HPL_gpukernel.c)
PGC/x86-64 Linux 18.10-1: compilation aborted
Makefile:94: recipe for target 'HPL_gpukernel.o' failed
make[2]: *** [HPL_gpukernel.o] Error 2
make[2]: Leaving directory '/home/ncku/z_acc/hpl-2.0-openacc/src/auxil/cuda'
Make.top:54: recipe for target 'build_src' failed
make[1]: *** [build_src] Error 2
make[1]: Leaving directory '/home/ncku/z_acc/hpl-2.0-openacc'
Makefile:72: recipe for target 'build' failed
make: *** [build] Error 2

I think you’ll need to send a compilable example for us to look at further. The compiler is complaining about how variables in your loop expressions are declared/available on the device. Send it to trs@pgroup.com and we will take a look.

The multiple initializations in the for-loop should be fixed for pgcc on Linux with release 20.1 once it released.