Is it possible use 4 nested loops with OpenACC?

rikisyo · September 13, 2013, 2:53pm

I am trying to put 4 nested loops on GPU by OpenACC. Here is a simplified example:

Subroutine indexed_copy_4d( &
   arr_dst, arr_src, &
   i0,i1,is, j0,j1,js, k0,k1,ks, m0,m1,ms, &
   ki_dst, kj_dst, kk_dst, km_dst, kc_dst, &
   ki_src, kj_src, kk_src, km_src, kc_src )

Implicit None

Real, Intent(out), Dimension(1:) :: arr_dst
Real, Intent(in), Dimension(1:) :: arr_src

Integer, Intent(in) :: &
   i0,i1,is, j0,j1,js, k0,k1,ks, m0,m1,ms, &
   ki_dst, kj_dst, kk_dst, km_dst, kc_dst, &
   ki_src, kj_src, kk_src, km_src, kc_src

Integer :: i,j,k,m

!$acc kernels present(arr_dst,arr_src)
!$acc loop independent
do i=i0,i1,is
!$acc loop independent
do j=j0,j1,js
!$acc loop independent
do k=k0,k1,ks

   !$acc loop seq              ! $$$$
   do m=m0,m1,ms          ! $$$$

      arr_dst(ki_dst*i+kj_dst*j+kk_dst*k+kc_dst) = arr_src(ki_src*i+kj_src*j+kk_src*k+kc_src)

   enddo             ! $$$$

enddo
enddo
enddo
!$acc end kernels

End Subroutine indexed_copy_4d

Eventually m needs to be included in the calculated index, but that’s irrelevant here. The problem is that compiler always fails due to internal error:

PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unknown variable reference (nestedloop.f90: 23)
PGF90-S-0000-Internal compiler error. gen_aili: unrec. ili opcode:     345 (nestedloop.f90: 29)
pgf90-Fatal-/home/lluo6/pgi/linux86-64/13.8/bin/pgf902 TERMINATED by signal 11
Arguments to /home/lluo6/pgi/linux86-64/13.8/bin/pgf902
/home/lluo6/pgi/linux86-64/13.8/bin/pgf902 /tmp/pgf90RsHcbFp-ah1T.ilm -fn nestedloop.f90 -opt 2 -terse 1 -inform warn -x 51 0x20 -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -x 125 0x20000 -quad -x 59 4 -x 59 4 -tp istanbul -x 120 0x1000 -x 124 0x1400 -y 15 2 -x 57 0x3b0000 -x 58 0x48000000 -x 49 0x100 -x 120 0x200 -astype 0 -x 70 0x40000000 -x 124 1 -accel nvidia -accel host -x 186 0x80000 -x 180 0x400 -x 180 0x4000000 -x 163 0x1 -x 189 8 -x 176 0x140000 -x 177 0x0202007f -x 176 0x100 -x 186 0x10000 -x 176 0x100 -x 186 0x20000 -x 176 0x100 -x 176 0x100 -x 189 4 -y 70 0x40000000 -cmdline '+pgf90 nestedloop.f90 -acc -c' -asm /tmp/pgf90ZsHczs4J7LZr.s

I tried using parallel construct, changing loop orders,… Always get internal error like above.

However, if I just remove all the lines marked with “! $$$$” - removing the internal loop, the compilation finishes without any problem.

It would be straightforward to implement equivalent code in CUDA, so I really don’t know why a sequential loop inside a kernel thread would cause any trouble like this.

Comments are welcome.

MatColgrove · September 13, 2013, 4:47pm

Hi rikisyo,

This is a compiler bug that looks like it started with release 13.3 when we increased the loop analysis level depth. The error is being caused by the skip count in the “m” loop, so the work around would be to remove “,ms”.

I added TPR#19579 and sent it to engineering. Since we’re in the late stages of 13.9 release testing, I doubt any fix will make it into 13.9. Possible, but more likely this would go into 13.10.

Mat

rikisyo · September 13, 2013, 5:06pm

Problem solved.

Thank you!

tull · November 1, 2013, 9:11pm

This has been fixed in the 13.10 release.

thanks,
dave

rikisyo · November 7, 2013, 2:07pm

Thanks for the update!

Topic		Replies	Views
Parallel (async) execution of an OpenACC loop on multiple GPUs is not working when added a nested seq loop (Fortran) nvc, nvc++ and nvfortran	1	925	November 18, 2022
Add OpenACC to a Fortran loop Legacy PGI Compilers	5	7250	December 3, 2015
PGF90-F-0155-Compiler failed to translate accelerator region Legacy PGI Compilers	6	9374	December 6, 2013
Invalid loop error in openacc nvc, nvc++ and nvfortran kernel	2	389	November 30, 2020
Accelerator restriction: invalid loop Legacy PGI Compilers	5	6509	September 26, 2017
Accelerator restriction: unsupported call to ... Legacy PGI Compilers	6	9487	January 30, 2013
a 3 levels of loop Legacy PGI Compilers	1	2103	September 6, 2012
How to not parallelize inner loops in OpenACC ? Legacy PGI Compilers	7	3852	May 1, 2020
Nested loops in C Legacy PGI Compilers	2	3741	September 9, 2010
OpenACC and nested loops Legacy PGI Compilers	2	4088	September 19, 2014

Is it possible use 4 nested loops with OpenACC?

Related topics