could anybody please give me a hint? Or even a solution? I have this very little program, it compiles alright. But when I start it, I get
call to cuMemcpyHtoD returned error 1: Invalid value
CUDA driver version: 4010
It is something with the update clause…
The problem here is that you are mixing models. The “mirror” directive is part of the PGI Accelerator Model, not OpenACC. The equivalent OpenACC method, device_resident, is not yet implemented. Below are the two versions of your program, the first using just the PGI Accelerator Model and the second using just OpenACC.
Hope this helps,
Mat
PGI Accelerator Version
$ cat test_pgi.f90
PROGRAM MAIN
IMPLICIT NONE
INTEGER :: zaehl
REAL, ALLOCATABLE :: eins(:)
!$acc mirror(eins)
REAL, ALLOCATABLE :: was(:)
ALLOCATE(eins(5))
ALLOCATE(was(5))
eins(:) = 1.
!$acc update device(eins)
!$acc region
DO zaehl = 1, 100
was(zaehl) = eins(zaehl)
end do
!$acc end region
print *, was
END PROGRAM
$ pgf90 -ta=nvidia -Minfo test_pgi.f90
main:
7, Generating local(eins(:))
15, Generating update device(eins(:))
17, Generating copyout(was(1:100))
Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
18, Loop is parallelizable
Accelerator kernel generated
18, !$acc do parallel, vector(96) ! blockidx%x threadidx%x
CC 1.0 : 4 registers; 44 shared, 4 constant, 0 local memory bytes; 100% occupancy
CC 2.0 : 8 registers; 4 shared, 56 constant, 0 local memory bytes; 50% occupancy
$ a.out
1.000000 1.000000 1.000000 1.000000
1.000000
OpenACC version:
$ cat test_acc.f90
PROGRAM MAIN
IMPLICIT NONE
INTEGER :: zaehl
REAL, ALLOCATABLE :: eins(:)
REAL, ALLOCATABLE :: was(:)
ALLOCATE(eins(5))
ALLOCATE(was(5))
!$acc data create(eins)
eins(:) = 1.
!$acc update device(eins)
!$acc kernels
DO zaehl = 1, 100
was(zaehl) = eins(zaehl)
end do
!$acc end kernels
!$acc end data
print *, was
END PROGRAM
$ pgf90 -acc -Minfo test_acc.f90
main:
12, Generating local(eins(:))
16, Generating update device(eins(:))
18, Generating copyout(was(1:100))
Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
19, Loop is parallelizable
Accelerator kernel generated
19, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
CC 1.0 : 4 registers; 48 shared, 4 constant, 0 local memory bytes; 33% occupancy
CC 2.0 : 8 registers; 4 shared, 60 constant, 0 local memory bytes; 16% occupancy
$ a.out
1.000000 1.000000 1.000000 1.000000
1.000000