OpenACC: Array Create = "unsupported statement type&amp

MuellerM · August 16, 2014, 3:33pm

Are local arrays supported in OpenACC? If I change ‘flux_x_temp’ to an allocatable it works, but it feels rather weird to allocate it on the host only to completely forget about it and let OpenACC use it on the device instead. What’s the best practice on locals?

Tested on: Suse Linux, OSX, same results.

The entire code:

module test
implicit none
contains
subroutine wrapper()
	implicit none
	call kernel ()
end subroutine

subroutine kernel()
	implicit none
	real(8) :: flux_x_temp(5,5,1)
	integer(4) :: i, j

!$acc kernels create(flux_x_temp)
!$acc loop independent
	do j=1,5
!$acc loop independent
		do i = 1,5
			flux_x_temp(i,j,0) = 2.0d0
		end do
	end do
!$acc end kernels
end subroutine
end module

program asuca
use test, only: wrapper
implicit none
call wrapper()
stop
end program

Result:

pgf90 -Minfo=accel,inline -Mneginfo -ta=nvidia test_openacc.f90
PGF90-S-0155-Accelerator region ignored; see -Minfo messages (test_openacc.f90: 16)
kernel:
16, Accelerator region ignored
18, Accelerator restriction: loop contains unsupported statement type
19, Accelerator restriction: unsupported statement type
0 inform, 0 warnings, 1 severes, 0 fatal for kernel
pgf90 -v
Export PGI=/usr/apps.sp3/isv/pgi/14.7
pgf90-Warning-No files to process

MatColgrove · August 18, 2014, 5:17pm

Hi MuellerM,

The problem here is that dead-code elimination optimization is removing the assignment since the “flux_x_temp” array is never used. Compiling the code without optimization (-O0) or using the values in the array will allow the region to be accelerated.

% pgf90 -acc test.f90 -Minfo=accel -O0
kernel:
     14, Generating create(flux_x_temp(:,:,:))
         Generating Tesla code
     16, Loop is parallelizable
     18, Loop is parallelizable
         Accelerator kernel generated
         16, !$acc loop gang, vector(4) ! blockidx%y threadidx%y
         18, !$acc loop gang, vector(32) ! blockidx%x threadidx%x

Hope this helps,
Mat

MuellerM · August 18, 2014, 10:50pm

Thanks Mat, I didn’t expect this to happen before the OpenACC parallelization comes in. Wouldn’t it be better to do all optimizations after the CUDA (or PTX) code has been generated, to avoid these sorts of errors?

Edit: Ah I think this might come from the optimizer touching the codetree only after Fortran has been parsed, but not after CUDA C has been generated? I think I can understand the reasoning behind this. The speedups I sometimes see for compute bound code vs. naive CUDA C code wouldn’t be possible otherwise, except if you make another pass over the generated C code.

Topic		Replies	Views
Compiler confusing arrays for functions Legacy PGI Compilers	8	10403	September 10, 2014
Accelerator restriction: unsupported call to ... Legacy PGI Compilers	6	9433	January 30, 2013
Accelerator region ignored; no parallel kernels found Legacy PGI Compilers	9	6330	January 13, 2012
understanding problems with acc directives. Legacy PGI Compilers	7	12736	May 3, 2010
openacc fortran private arrays problem Legacy PGI Compilers	2	2035	June 29, 2018
12.4: problem with the OpenACC present directive Legacy PGI Compilers	1	1624	May 14, 2012
Vector array assignments within a $acc parallel region Legacy PGI Compilers	13	11012	November 27, 2013
Privatization of array Legacy PGI Compilers	9	17664	July 14, 2010
OpenACC diff between GPU + CPU codes Legacy PGI Compilers	5	4078	May 31, 2012
Oddity in OpenACC Legacy PGI Compilers	15	13095	November 23, 2015

OpenACC: Array Create = "unsupported statement type&amp

Related topics