No parallel kernels found, accelerator region ignored

I modified the f2.f program (which and compiles and runs
as expected) at the following site:
http://www.pgroup.com/lit/articles/insider/v1n1a1.htm.
to:

program main
use accel_lib
integer :: n,n1 ! size of the vector
real,dimension(:),allocatable :: a ! the vector
real,dimension(:),allocatable :: b ! the vector
real,dimension(:),allocatable :: r ! the results
real,dimension(:),allocatable :: e ! expected results
integer :: i
integer :: c0, c1, c2, c3, cgpu, chost
character(10) :: arg1
if( iargc() .gt. 0 )then
call getarg( 1, arg1 )
read(arg1,‘(i10)’) n
else
n = 100000
endif
n1 = 1
if( n .le. 0 ) n = 100000
allocate(a(n))
allocate(b(n))
allocate(r(n))
allocate(e(n))
do i = 1,n
a(i) = i2.0
b(i) = i
2.0
enddo
call system_clock( count=c1 )
!call acc_init( acc_device_nvidia )
!$acc region
do i = n1,n
r(i) = sin(a(i)) ** 2 + cos(b(i)) ** 2
enddo
!$acc end region
call multiply1()
call system_clock( count=c2 )
cgpu = c2 - c1
do i = 1,n
e(i) = sin(a(i)) ** 2 + cos(a(i)) ** 2
enddo
call system_clock( count=c3 )
chost = c3 - c2
! check the results
do i = 1,n
if( abs(r(i) - e(i)) .gt. 0.000001 )then
print *, i, r(i), e(i)
endif
enddo
print *, n, ’ iterations completed’
print *, cgpu, ’ microseconds on GPU’
print *, chost, ’ microseconds on host’

contains

subroutine multiply1()

!call acc_init( acc_device_nvidia )
!$acc region
do i = n1,n
r(i) = sin(a(i)) ** 2 + cos(b(i)) ** 2
enddo
!$acc end region
end subroutine

end program


When I compile this I get the following error:main:
29, No parallel kernels found, accelerator region ignored
31, Accelerator restriction: induction variable live-out from loop: i
32, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: .dY0002
multiply1:
57, No parallel kernels found, accelerator region ignored
59, Accelerator restriction: induction variable live-out from loop: i
60, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: .dY0005

Any one knows what is going on?

Hi gbj,

This looks like a compiler error being caused by the use of the contained subroutine. I’ve sent a report to our engineers (TPR#16595) and hopefully we can have this fixed soon.

The workaround is to move the contained subroutine to an external subroutine.

% cat f1.f
        program main
        use accel_lib
        implicit none
        integer :: n,n1         ! size of the vector
        real,dimension(:),allocatable :: a ! the vector
        real,dimension(:),allocatable :: b ! the vector
        real,dimension(:),allocatable :: r ! the results
        real,dimension(:),allocatable :: e ! expected results
        integer :: i,ii,iargc
        integer :: c0, c1, c2, c3, cgpu, chost
        character(10) :: arg1
        if( iargc() .gt. 0 )then
           call getarg( 1, arg1 )
           read(arg1,'(i10)') n
        else
           n = 100000
        endif
        n1 = 1
        if( n .le. 0 ) n = 100000
        allocate(a(n))
        allocate(b(n))
        allocate(r(n))
        allocate(e(n))
        do i = 1,n
           a(i) = i*2.0
           b(i) = i*2.0
        enddo
        call acc_init( acc_device_nvidia )
        call system_clock( count=c1 )

!$acc region
        do i = n1,n
           r(i) = sin(a(i)) ** 2 + cos(b(i)) ** 2
        enddo
!$acc end region

        call multiply1(r,a,b,n1,n)
        call system_clock( count=c2 )
        cgpu = c2 - c1
        do i = 1,n
        e(i) = sin(a(i)) ** 2 + cos(a(i)) ** 2
        enddo
        call system_clock( count=c3 )
        chost = c3 - c2
!       check the results
        do i = 1,n
           if( abs(r(i) - e(i)) .gt. 0.000001 )then
              print *, i, r(i), e(i)
           endif
        enddo
        print *, n, ' iterations completed'
        print *, cgpu, ' microseconds on GPU'
        print *, chost, ' microseconds on host'

        end program


        subroutine multiply1(r,a,b,n1,n)
        implicit none
        real,dimension(*) :: a ! the vector
        real,dimension(*) :: b ! the vector
        real,dimension(*) :: r ! the results
        integer :: n, n1, i

!       call acc_init( acc_device_nvidia )
!$acc region
        do i = n1,n
            r(i) = sin(a(i)) ** 2 + cos(b(i)) ** 2
        enddo
!$acc end region
        end subroutine

% pgf90 -ta=nvidia,time -Minfo=accel f1.f -V10.2 -fastsse -o f1.out
main:
     31, Generating copyin(b(1:n))
         Generating copyin(a(1:n))
         Generating copyout(r(1:n))
     32, Loop is parallelizable
         Accelerator kernel generated
         32, !$acc do parallel, vector(256)
multiply1:
     66, Generating copyin(b(n1:n))
         Generating copyin(a(n1:n))
         Generating copyout(r(n1:n))
     67, Loop is parallelizable
         Accelerator kernel generated
         67, !$acc do parallel, vector(256)
%
% f1.out
       100000  iterations completed
         2699  microseconds on GPU
         1432  microseconds on host

Accelerator Kernel Timing data
/tmp/f1.f
  multiply1
    66: region entered 1 time
        time(us): total=1211
                  kernels=155 data=1056
        67: kernel launched 1 times
            grid: [391]  block: [256]
            time(us): total=155 max=155 min=155 avg=155
/tmp/f1.f
  main
    31: region entered 1 time
        time(us): total=1482
                  kernels=164 data=1318
        32: kernel launched 1 times
            grid: [391]  block: [256]
            time(us): total=164 max=164 min=164 avg=164
acc_init.c
  acc_init
    41: region entered 1 time
        time(us): init=4293831

Thanks,
Mat

Thank you Mat. Can you please notify me when this update has been applied?

Hi Gustaaf,

I’ve added you to the notification list for TPR#16595. I’ll also update this post once a fix is available.

  • Mat