Thanks for your suggestion.
If I take away -acc , the error didn’t exist.
But in this way ,HPL program didn’t accelerate.
And here is the code I fixed
{
register double t0;
int i, iail, iblj, icij, j, jal, jbj, jcj, l;
double * Cjcj;
jbj = 0;
jcj = 0;
#pragma acc kernels loop
for( j = 0 ; j < N; j++ )
{
jbj += LDB;
jcj += LDC;
Cjcj = C+jcj;
DSCAL( M, BETA, Cjcj, 1 );
jal = 0;
iblj = jbj;
#pragma acc loop
for( l = 0 ; l < K; l++ )
{
jal += LDA;
iblj += 1;
t0 = ALPHA * B[iblj];
iail = jal;
icij = jcj;
#pragma acc loop
for( i = 0 ; i < M; i++ )
{
iail += 1;
icij += 1;
C[icij] += A[iail] * t0;
}
}
}
}
the new problem
pgcc -o HPL_gpukernel.o -c -DHPL_CALL_CBLAS -I/home/ncku/z_acc/hpl-2.0-openacc/include -I/home/ncku/z_acc/hpl-2.0-openacc/include/cuda -I /home/ncku/mpich/include -acc -fast -Minfo ../HPL_gpukernel.c
HPL_accdgemm:
912, HPL_accdgemm0 inlined, size=153 (inline) file ../HPL_gpukernel.c (832)
842, Memory zero idiom, loop replaced by call to __c_mzero8
851, Invariant if transformation
Loop not fused: no successor loop
Generated an alternate version of the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
Generated 1 prefetches in scalar loop
FMA (fused multiply-add) instruction(s) generated
856, Memory zero idiom, loop replaced by call to __c_mzero8
Loop not fused: different loop trip count
Loop not fused: function call before adjacent loop
Loop not vectorized: data dependency
Generated an alternate version of the loop
Loop not vectorized: data dependency
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Loop not vectorized: data dependency
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Loop unrolled 8 times
Generated 2 prefetches in scalar loop
Generated 1 prefetches in scalar loop
FMA (fused multiply-add) instruction(s) generated
858, Invariant if transformation
Loop not fused: no successor loop
Generated an alternate version of the loop
Generated vector simd code for the loop containing reductions
Generated a prefetch instruction for the loop
Generated vector simd code for the loop containing reductions
Generated a prefetch instruction for the loop
Generated vector simd code for the loop containing reductions
Generated a prefetch instruction for the loop
Generated vector simd code for the loop containing reductions
Generated a prefetch instruction for the loop
Generated 1 prefetches in scalar loop
FMA (fused multiply-add) instruction(s) generated
917, HPL_accdgemm0 inlined, size=153 (inline) file ../HPL_gpukernel.c (832)
842, Memory zero idiom, loop replaced by call to __c_mzero8
851, Invariant if transformation
Loop not fused: no successor loop
Generated an alternate version of the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
Generated 1 prefetches in scalar loop
FMA (fused multiply-add) instruction(s) generated
856, Memory zero idiom, loop replaced by call to __c_mzero8
Loop not fused: different loop trip count
Loop not fused: function call before adjacent loop
Loop not vectorized: data dependency
Generated an alternate version of the loop
Loop not vectorized: data dependency
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Loop not vectorized: data dependency
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Loop unrolled 8 times
Generated 2 prefetches in scalar loop
Generated 1 prefetches in scalar loop
FMA (fused multiply-add) instruction(s) generated
858, Invariant if transformation
Loop not fused: no successor loop
Generated an alternate version of the loop
Generated vector simd code for the loop containing reductions
Generated a prefetch instruction for the loop
Generated vector simd code for the loop containing reductions
Generated a prefetch instruction for the loop
Generated vector simd code for the loop containing reductions
Generated a prefetch instruction for the loop
Generated vector simd code for the loop containing reductions
Generated a prefetch instruction for the loop
Generated 1 prefetches in scalar loop
FMA (fused multiply-add) instruction(s) generated
HPL_accdtrsm0:
453, Memory zero idiom, loop replaced by call to __c_mzero8
464, HPL_accdtrsmLUNN inlined, size=25 (inline) file ../HPL_gpukernel.c (123)
126, Loop not fused: no successor loop
128, Loop not fused: dependence chain to sibling loop
Generated an alternate version of the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
133, Loop not vectorized: data dependency
Loop unrolled 4 times
FMA (fused multiply-add) instruction(s) generated
465, HPL_accdtrsmLUNU inlined, size=22 (inline) file ../HPL_gpukernel.c (150)
153, Loop not fused: no successor loop
155, Loop not fused: dependence chain to sibling loop
Generated an alternate version of the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
159, Loop not vectorized: data dependency
Loop unrolled 4 times
FMA (fused multiply-add) instruction(s) generated
470, HPL_accdtrsmLUTN inlined, size=20 (inline) file ../HPL_gpukernel.c (176)
180, Loop not fused: no successor loop
185, Generated an alternate version of the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
FMA (fused multiply-add) instruction(s) generated
471, HPL_accdtrsmLUTU inlined, size=18 (inline) file ../HPL_gpukernel.c (203)
207, Loop not fused: no successor loop
209, Generated 1 prefetches in scalar loop
212, Generated an alternate version of the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
FMA (fused multiply-add) instruction(s) generated
479, HPL_accdtrsmLLNN inlined, size=25 (inline) file ../HPL_gpukernel.c (15)
18, Loop not fused: no successor loop
20, Loop not fused: dependence chain to sibling loop
Generated an alternate version of the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
24, Loop not vectorized: data dependency
Loop unrolled 4 times
FMA (fused multiply-add) instruction(s) generated
480, HPL_accdtrsmLLNU inlined, size=22 (inline) file ../HPL_gpukernel.c (41)
44, Loop not fused: no successor loop
46, Loop not fused: dependence chain to sibling loop
Generated an alternate version of the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
49, Loop not vectorized: data dependency
Loop unrolled 4 times
FMA (fused multiply-add) instruction(s) generated
485, HPL_accdtrsmLLTN inlined, size=23 (inline) file ../HPL_gpukernel.c (66)
70, Loop not fused: no successor loop
76, Generated an alternate version of the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
FMA (fused multiply-add) instruction(s) generated
486, HPL_accdtrsmLLTU inlined, size=21 (inline) file ../HPL_gpukernel.c (95)
99, Loop not fused: no successor loop
105, Generated an alternate version of the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
FMA (fused multiply-add) instruction(s) generated
497, HPL_accdtrsmRUNN inlined, size=27 (inline) file ../HPL_gpukernel.c (337)
340, Loop not fused: no successor loop
342, Loop not fused: different loop trip count
Generated an alternate version of the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
343, Loop not fused: different loop trip count
345, Loop not vectorized: data dependency
Loop unrolled 4 times
FMA (fused multiply-add) instruction(s) generated
348, Loop not vectorized: data dependency
Loop unrolled 8 times
Generated 1 prefetches in scalar loop
498, HPL_accdtrsmRUNU inlined, size=22 (inline) file ../HPL_gpukernel.c (362)
365, Loop not fused: no successor loop
367, Loop not fused: different loop trip count
Generated an alternate version of the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
370, Loop not vectorized: data dependency
Loop unrolled 4 times
FMA (fused multiply-add) instruction(s) generated
503, HPL_dtrsmRUTN inlined, size=30 (inline) file ../HPL_gpukernel.c (386)
390, Loop not fused: no successor loop
393, Loop not fused: different loop trip count
Loop not vectorized: data dependency
Loop unrolled 8 times
Generated 1 prefetches in scalar loop
394, Loop not fused: different loop trip count
397, Loop not vectorized: data dependency
Loop unrolled 8 times
Generated 2 prefetches in scalar loop
FMA (fused multiply-add) instruction(s) generated
400, Generated an alternate version of the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
504, HPL_accdtrsmRUTU inlined, size=24 (inline) file ../HPL_gpukernel.c (414)
418, Loop not fused: no successor loop
421, Loop not fused: different loop trip count
424, Loop not vectorized: data dependency
Loop unrolled 8 times
Generated 2 prefetches in scalar loop
FMA (fused multiply-add) instruction(s) generated
427, Generated an alternate version of the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
512, HPL_accdtrsmRLNN inlined, size=31 (inline) file ../HPL_gpukernel.c (229)
232, Loop not fused: no successor loop
235, Loop not fused: different loop trip count
Generated an alternate version of the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
236, Loop not fused: different loop trip count
239, Loop not vectorized: data dependency
Loop unrolled 4 times
FMA (fused multiply-add) instruction(s) generated
242, Loop not vectorized: data dependency
Loop unrolled 8 times
Generated 1 prefetches in scalar loop
513, HPL_accdtrsmRLNU inlined, size=25 (inline) file ../HPL_gpukernel.c (256)
259, Loop not fused: no successor loop
262, Loop not fused: different loop trip count
Generated an alternate version of the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
266, Loop not vectorized: data dependency
Loop unrolled 4 times
FMA (fused multiply-add) instruction(s) generated
518, HPL_accdtrsmRLTN inlined, size=30 (inline) file ../HPL_gpukernel.c (282)
286, Loop not fused: no successor loop
288, Loop not fused: different loop trip count
Loop not vectorized: data dependency
Loop unrolled 8 times
Generated 1 prefetches in scalar loop
289, Loop not fused: different loop trip count
293, Loop not vectorized: data dependency
Loop unrolled 8 times
Generated 2 prefetches in scalar loop
FMA (fused multiply-add) instruction(s) generated
296, Generated an alternate version of the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
519, HPL_accdtrsmRLTU inlined, size=24 (inline) file ../HPL_gpukernel.c (310)
316, Loop not fused: different loop trip count
320, Loop not vectorized: data dependency
Loop unrolled 8 times
Generated 2 prefetches in scalar loop
FMA (fused multiply-add) instruction(s) generated
323, Generated an alternate version of the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
Generated vector simd code for the loop
Generated a prefetch instruction for the loop
PGC-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Could not find allocated-variable index for symbol - C (../HPL_gpukernel.c: 673)
HPL_accdgemmNN:
674, Complex loop carried dependence of A->,C->,B-> prevents parallelization
Accelerator kernel generated
Generating Tesla code
674, #pragma acc loop seq
679, #pragma acc loop seq
683, #pragma acc loop seq
691, #pragma acc loop seq
674, Complex loop carried dependence of A->,C->,B-> prevents parallelization
679, Scalar last value needed after loop for Cjcj-> at line 679
Loop carried scalar dependence for Cjcj at line 679
Scalar last value needed after loop for Cjcj at line 679
683, Accelerator restriction: size of the GPU copy of B is unknown
Complex loop carried dependence of B->,A->,C-> prevents parallelization
691, Accelerator restriction: size of the GPU copy of C,A is unknown
Complex loop carried dependence of A->,C-> prevents parallelization
Parallelization requires privatization of C-> as well as last value
PGC-F-0704-Compilation aborted due to previous errors. (../HPL_gpukernel.c)
PGC/x86-64 Linux 18.10-1: compilation aborted
Makefile:94: recipe for target 'HPL_gpukernel.o' failed
make[2]: *** [HPL_gpukernel.o] Error 2
make[2]: Leaving directory '/home/ncku/z_acc/hpl-2.0-openacc/src/auxil/cuda'
Make.top:54: recipe for target 'build_src' failed
make[1]: *** [build_src] Error 2
make[1]: Leaving directory '/home/ncku/z_acc/hpl-2.0-openacc'
Makefile:72: recipe for target 'build' failed
make: *** [build] Error 2