update clause gives error

Hi,

could anybody please give me a hint? Or even a solution? I have this very little program, it compiles alright. But when I start it, I get
call to cuMemcpyHtoD returned error 1: Invalid value
CUDA driver version: 4010
It is something with the update clause…

Thanks a lot!

Dirk



PROGRAM MAIN

IMPLICIT NONE

INTEGER :: zaehl
REAL, ALLOCATABLE :: eins(:)
!$acc mirror(eins)
REAL, ALLOCATABLE :: was(:)

ALLOCATE(eins(5))
ALLOCATE(was(5))

eins(:) = 1.

!$acc update device(eins)

!$acc kernels loop copy(was(1:5))
DO zaehl = 1, 100
was(zaehl) = eins(zaehl)
end do
!$acc end kernels
print *, was
END PROGRAM

Hi Dirk,

The problem here is that you are mixing models. The “mirror” directive is part of the PGI Accelerator Model, not OpenACC. The equivalent OpenACC method, device_resident, is not yet implemented. Below are the two versions of your program, the first using just the PGI Accelerator Model and the second using just OpenACC.

Hope this helps,
Mat

PGI Accelerator Version

$ cat test_pgi.f90 
PROGRAM MAIN

IMPLICIT NONE

INTEGER :: zaehl
REAL, ALLOCATABLE :: eins(:)
!$acc mirror(eins)
REAL, ALLOCATABLE :: was(:)

ALLOCATE(eins(5))
ALLOCATE(was(5))

eins(:) = 1.

!$acc update device(eins)

!$acc region 
DO zaehl = 1, 100
was(zaehl) = eins(zaehl)
end do
!$acc end region
print *, was
END PROGRAM
$ pgf90 -ta=nvidia -Minfo test_pgi.f90
main:
      7, Generating local(eins(:))
     15, Generating update device(eins(:))
     17, Generating copyout(was(1:100))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     18, Loop is parallelizable
         Accelerator kernel generated
         18, !$acc do parallel, vector(96) ! blockidx%x threadidx%x
             CC 1.0 : 4 registers; 44 shared, 4 constant, 0 local memory bytes; 100% occupancy
             CC 2.0 : 8 registers; 4 shared, 56 constant, 0 local memory bytes; 50% occupancy
$ a.out
    1.000000        1.000000        1.000000        1.000000     
    1.000000

OpenACC version:

$ cat test_acc.f90 
PROGRAM MAIN

IMPLICIT NONE

INTEGER :: zaehl
REAL, ALLOCATABLE :: eins(:)
REAL, ALLOCATABLE :: was(:)

ALLOCATE(eins(5))
ALLOCATE(was(5))

!$acc data create(eins)

eins(:) = 1.

!$acc update device(eins)

!$acc kernels 
DO zaehl = 1, 100
was(zaehl) = eins(zaehl)
end do
!$acc end kernels
!$acc end data

print *, was
END PROGRAM
$ pgf90 -acc -Minfo test_acc.f90
main:
     12, Generating local(eins(:))
     16, Generating update device(eins(:))
     18, Generating copyout(was(1:100))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     19, Loop is parallelizable
         Accelerator kernel generated
         19, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
             CC 1.0 : 4 registers; 48 shared, 4 constant, 0 local memory bytes; 33% occupancy
             CC 2.0 : 8 registers; 4 shared, 60 constant, 0 local memory bytes; 16% occupancy
$ a.out
    1.000000        1.000000        1.000000        1.000000     
    1.000000