queston about data region

Hi,

I create a data region by !$acc data copy/copyin/create for GPU parallel computing,but some data results really confused me .

here is the code:

subroutine loop
use head !!!!!!store data 
N=0
N_pre=0
etm=300
N_cdcl=10
N_state=300
!$acc data copyin(f(1:im,1:jm,0:kk),feq(1:im,1:jm,0:kk),ex(0:kk),ey(0:kk),y(1:jm),x(1:im)) &
!$acc      create(fm(1:im,1:jm,0:kk),aa_cy1(:,:),aa_cy2(:,:,:),aax(:,:),aay(:,:)) &
!$acc      copy(ux(1:im,1:jm),uy(1:im,1:jm),rho(1:im,1:jm))
100 continue
call collision!!!!!!!!!!!!
call stream!!!!!!!!!!!!!!parallel computing ux,uy,rho
call macro!!!!!!!!!!!!!!!
!$acc update host(ux(1:im,1:jm),uy(1:im,1:jm),rho(1:im,1:jm)
call velocity_modified!!!!!!!!!!!!!calculate with cpu

time=N*dt*U

if(mod(N,N_cdcl).eq.0) then
	write(31,*) time,cd_cy
	write(32,*) time,cl_cy
	write(33,*) time,cd_f(1)
	write(34,*) time,cl_f(1)
	write(35,*) time,cd_f(2)
	write(36,*) time,cl_f(2)

	write(41,*) time,xyfl(nfl,1,1)
	write(42,*) time,xyfl(nfl,2,1)
	write(43,*) time,xyfl(nfl,1,2)
	write(44,*) time,xyfl(nfl,2,2)

    write(47,*) time,dyc
	write(48,*) time,voc
	write(5,*) time,spow
    print *,time,'cd=',cd_cy,',cl=',cl_cy,',dyc=',dyc
    !$acc update device(ux(1:im,1:jm),uy(1:im,1:jm))
	call cylinder_vorticity
   
	write(45,*) time,vor_up
	write(46,*) time,vor_dn
endif

if(mod(N,200).eq.0) then
	call vorticity!!!!!!!!!!!!calculate vor with GPU

	call output!!!!!!!!!!!!!!output rho ux uy vor
endif
if(time.ge.120.0.and.time.le.140.0) then
	if(mod(N,N_cdcl).eq.0) then
        write(51,*) xyfl(nfl,1,1),xyfl(nfl,2,1)
        write(52,*) xyfl(nfl,1,2),xyfl(nfl,2,2)
	write(53,*) cd_cy,cl_cy
	write(54,*) cd_f(1),cl_f(1)
	write(55,*) cd_f(2),cl_f(2)
    write(64,*) dyc,cd_cy
	write(65,*) dyc,cl_cy
    endif
endif

if(mod(N,5000).eq.0) call wholefield

call motion_cylinder
call forcing
call equilibrium

N=N+1
if(time.le.etm) goto 100
!$acc end data
end subroutine loop

after the computation,i found that the value of rho is not right ,so i added !$acc update host rho(:,:),and then the result goes right,but the vor was all right without the update clause ,as far as i’m concerned,the only difference between them was the declaring in data clause.
So,i wonder the parameter declared in data clause must use update clause to pass the value to the host,those not declared would automatically update.Am i right?
if not,please tell me how the data clause and data movement in/out kernels/parallel region work .thanks a lot

Hi Guo shuhao,

after the computation,i found that the value of rho is not right ,so i added !$acc update host rho(:,:),and then the result goes right,but the vor was all right without the update clause ,as far as i’m concerned,the only difference between them was the declaring in data clause.
So,i wonder the parameter declared in data clause must use update clause to pass the value to the host,those not declared would automatically update.Am i right?

Compute regions (“kernels” and “parallel”) have an implicit data region. The default for the implicit data uses “present or copy” semantics, meaning if the data is already present then no copy is done and the runtime uses the data already on the device. If the data is not present, then the runtime will copy it for you.

Most likely what’s happening with “vor” is that since it is not present (i.e. it’s not in the outer data region), then the runtime is copying it back and forth for you. While convenient, it is not always the best thing for performance since the copy will be done every time the compute region is encountered.

Data regions allow you to take over control as to when the data is created and copied to/from the device. Depending on the clauses you’re using (copy, copyin, copyout, create), the runtime will either copy the data at the beginning and ending of the region, just at the beginning, just at the ending, or not at all.

If you need to synchronize the host and device memory between the start and the end of a data region, you will want to use the “update” directive. The compiler does not have enough information about the program to tell where to add the update directives, so it is up to you to add them.

but some data results really confused me .

Does this explanation help?

-Mat

Thanks a lot!
I got one more question.in this program,i create a module “head” to store global variables,and in the data region i called some subroutines(like subroutine “stream”) to calculate with GPU,this subroutine includes some private variables(‘’fm",for example),which is used to calculate the value of global variables,no need to pass the value of “tem” between host and device,so i used a create clause in the beginning of data region,and i received an error"Unknown symbol used in data clause - fm" ,so i wonder if the value of the private variables would be copied back and forth every time the compute region is encountered or would it automatically be presented .here is the code.
the fm was used to calcalate the value of f (global variable)

subroutine stream
use head

dimension fm(im,jm,0:kk),a(3,kk),b(3,kk)
double precision fm,a,b
double precision dx1,dx2,dy1,dy2,f1,f2,f3

!$acc kernels
!$acc loop collapse(2),private(a(:,:),b(:,:))
do 10 i=2,im-1
do 10 j=2,jm-1
dx1=x(i-1)-x(i)
dx2=x(i+1)-x(i)
dy1=y(j-1)-y(j)
dy2=y(j+1)-y(j)
!$acc loop seq
do k=1,kk
a(1,k)=ex(k)*dt*(ex(k)*dt+dx2)/(dx1*(dx1-dx2))
a(2,k)=(ex(k)*dt+dx1)*(ex(k)*dt+dx2)/(dx1*dx2)
a(3,k)=ex(k)*dt*(ex(k)*dt+dx1)/(dx2*(dx2-dx1))

b(1,k)=ey(k)*dt*(ey(k)*dt+dy2)/(dy1*(dy1-dy2))
b(2,k)=(ey(k)*dt+dy1)*(ey(k)*dt+dy2)/(dy1*dy2)
b(3,k)=ey(k)*dt*(ey(k)*dt+dy1)/(dy2*(dy2-dy1))
end do
!$acc loop seq
do k=1,kk
f1=0.0
f2=0.0
f3=0.0
!$acc loop seq
do n=1,3
f1=f1+b(n,k)*f(i-1,j-2+n,k)
f2=f2+b(n,k)*f(i,j-2+n,k)
f3=f3+b(n,k)*f(i+1,j-2+n,k)
end do
fm(i,j,k)=a(1,k)*f1+a(2,k)*f2+a(3,k)*f3
end do
10 continue

do 20 i=2,im-1
do 20 j=2,jm-1
do 20 k=1,kk
f(i,j,k)=fm(i,j,k)
20 continue

do k=1,kk
do j=1,jm
f(1,j,k)=feq(1,j,k)+f(2,j,k)-feq(2,j,k)
f(im,j,k)=feq(im,j,k)+f(im-1,j,k)-feq(im-1,j,k)
end do

do i=2,im-1
f(i,1,k)=feq(i,1,k)+f(i,2,k)-feq(i,3,k)
f(i,jm,k)=feq(i,jm,k)+f(i,jm-1,k)-feq(i,jm-1,k)
end do
end do
!$acc end kernels

end subroutine stream

“fm” is declared locally in the main program. Hence is out of the module’s scoping. In other words, you can’t reference “fm” inside the module.

Either move the data region for “fm” into the main program, or move “fm” into the module.