Hello Mat .
Double Check … it doesn’t work correctly with pgi13.3 …
Has you, before submitting the bug, I have tested the wait directive with no success … Even with 2 or 3 wait after the update it doesn’t work …
I recheck the test, and trying your suggestion …
… the only way to make it work correctly/reproductively is to put a double udpate host before using the value in the host part ?!
( initialization on the host or print have no/marginal effect )
!$acc update host(AA)
!$acc update host(AA)
AA(Nvec) = AA(Nvec) * 10.0
print*, "AFTER UPDATE AA=" , AA(Nvec) ; call flush(6)
Compilation with this double update :
pgf90 --version -ta=nvidia,cc13,cc20,cuda5.0 -Minfo=ccff,all,intensity -Mprof=ccff test_update_mat.f90 -o test_update_mat_133_dble_update 2>&1 | egrep "pgf90|update"
34, Generating update host(aa(:))
35, Generating update host(aa(:))
pgf90 13.3-0 64-bit target on x86-64 Linux -tp nehalem
Run 10 times test_update_mat_133_dble_update
for i in $( seq 10 ) ; do PGI_ACC_SYNCHRONOUS=0 test_update_mat_133_dble_update; done
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
Compilation with one update
pgf90 --version -ta=nvidia,cc13,cc20,cuda5.0 -Minfo=ccff,all,intensity -Mprof=ccff test_update_mat.f90 -o test_update_mat_133_one_update 2>&1 | egrep "pgf90|update"
35, Generating update host(aa(:))
pgf90 13.3-0 64-bit target on x86-64 Linux -tp nehalem
Run 10 times test_update_mat_133_one_update
for i in $( seq 10 ) ; do PGI_ACC_SYNCHRONOUS=0 test_update_mat_133_one_update; done
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
Another point to show that the pgi13.3 is in fault, the same executable produce a cuMemcpyDtoHAsync error in a previous generation of GPU on a GXT280
pgaccelinfo
CUDA Driver Version: 5000
NVRM version: NVIDIA UNIX x86_64 Kernel Module 304.54 Sat Sep 29 00:05:49 PDT 2012
CUDA Device Number: 0
Device Name: GeForce GTX 280
Device Revision Number: 1.3
...
for i in $( seq 10 ) ; do PGI_ACC_SYNCHRONOUS=0 test_update_mat_133_one_update; done
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
No problem with the double update on this platform to …
I post the simplified source code again to be sure we are doing the same test :
MODULE MODD_DATA
IMPLICIT NONE
INTEGER, PARAMETER :: Nvec=2048
REAL , ALLOCATABLE, DIMENSION(:) :: AA
!$acc mirror(AA)
CONTAINS
SUBROUTINE ALLOC_DATA_MODULE()
IMPLICIT NONE
ALLOCATE( AA(Nvec) )
END SUBROUTINE ALLOC_DATA_MODULE
SUBROUTINE INIT_DATA(XTAB,XVAL)
IMPLICIT NONE
REAL , DIMENSION(:) :: XTAB
!$acc reflected (XTAB)
REAL :: XVAL
!$acc kernels
XTAB = XVAL
!$acc end kernels
END SUBROUTINE INIT_DATA
END MODULE MODD_DATA
PROGRAM TEST_ASYNC
USE MODD_DATA
IMPLICIT NONE
CALL ALLOC_DATA_MODULE()
CALL INIT_DATA(AA, 10.0 )
!acc update host(AA)
!$acc update host(AA)
AA(Nvec) = AA(Nvec) * 10.0
print*, "AFTER UPDATE AA=" , AA(Nvec) ; call flush(6)
END PROGRAM TEST_ASYNC
A+
Juan