Hi again,
I’m taking an advice from one of my previous posts and tried playing around with the data region directive. From my understand !$acc data region local(…) would make all arrays inside the list local within the data region. Yet that’s not the case in my program
program main
!=====( Initialization )===========================!
use accel_lib
real, dimension(:), allocatable :: randnums, seeds
double precision, dimension(:), allocatable :: particles
real, parameter :: PI = 4 * atan(1.0)
integer :: numparticles, numseeds, seedinit, count(200)
integer, parameter :: K4B=selected_int_kind(9)
integer(K4B), parameter :: IA=16807,IM=2147483647,IQ=127773,IR=2836
real :: am
integer(K4B) :: gix,giy,gk,ix,iy,k
write(*,*) "How many particles?"
read(*,*) numparticles
write(*,*) "Initial seed?"
read(*,*) seedinit
numseeds = int(sqrt(200.0 * numparticles)) + 1
!future note: set some limit to the number of random numbers since the array might
!not be able to hold all of it
allocate(particles(numparticles))
allocate(seeds(numseeds))
allocate(randnums(numseeds ** 2))
call acc_init(acc_device_nvidia)
!=====( Generate Random Numbers )==================!
!$acc data region local(particles, seeds, randnums), copyout(count)
!$acc region
am = nearest(1.0,-1.0)/IM
iy=ior(ieor(888889999,abs(seedinit)),1)
ix=ieor(777755555,abs(seedinit))
!$acc do kernel
do j = 1, numseeds
ix=ieor(ix,ishft(ix,13))
ix=ieor(ix,ishft(ix,-17))
ix=ieor(ix,ishft(ix,5))
k=iy/IQ
iy=IA*(iy-k*IQ)-IR*k
if (iy < 0) iy=iy+IM
seeds(j)=am*ior(iand(IM,ieor(ix,iy)),1)
end do
!$acc do vector(256), parallel, independent
do j = 1, numseeds
giy=ior(ieor(888889999, int(1000 * seeds(j)) + 1), 1)
gix=ieor(777755555, int(1000 * seeds(j)) + 1)
!$acc do seq
do jj = 1, numseeds
gix=ieor(gix,ishft(gix,13))
gix=ieor(gix,ishft(gix,-17))
gix=ieor(gix,ishft(gix,5))
gk=giy/IQ
giy=IA*(giy-gk*IQ)-IR*gk
if (giy < 0) giy=giy+IM
randnums((j - 1) * numseeds + jj) = am*ior(iand(IM,ieor(gix,giy)),1)
end do
end do
!=====( Main )=================================!
!$acc do vector(256), parallel, independent
do j = 1, numparticles
particles(j) = 1
end do
!$acc end region
do i = 1,20
print *,randnums(i)
enddo
do j = 1,200
!$acc region do vector(256), parallel, independent
do jj = 1, numparticles
if (randnums((j - 1) * numparticles + jj) .lt. 0.1) particles(jj) = 0
end do
do jj = 1,numparticles
if (particles(jj) .eq. 1) count(j) = count(j) + 1
enddo
if (count(j) .eq. 0) goto 100
enddo
100 continue
!$acc end data region
open(unit = 2, file = 'data.txt')
write(2,1000) 0, numparticles
do i = 1, 200
write(2,1000) i, count(i)
if (count(i) .eq. 0) exit
end do
write(2,*) "c Cycle Number Number of particles"
1000 format (i5,i10)
end program
and all the prints of randnums give me 0.000000. It seems like randnums was made local only within the region directive and not the data region directive.
I also have a few question regarding data transfer
31, Generating local(randnums(:))
Generating local(seeds(:))
Generating local(particles(:))
Generating copyout(count(:))
32, Generating compute capability 1.3 binary
38, Loop carried scalar dependence for 'ix' at line 39
Loop carried scalar dependence for 'iy' at line 42
Loop carried scalar dependence for 'iy' at line 43
Inner sequential loop scheduled on host
49, Loop is parallelizable
Accelerator kernel generated
49, !$acc do parallel, vector(256)
Using register for 'seeds'
CC 1.3 : 15 registers; 20 shared, 100 constant, 0 local memory byte
s; 100 occupancy
54, Loop carried scalar dependence for 'gix' at line 55
Loop carried scalar dependence for 'giy' at line 58
Loop carried scalar dependence for 'giy' at line 59
Complex loop carried dependence of 'randnums' prevents parallelization
67, Loop is parallelizable
Accelerator kernel generated
67, !$acc do parallel, vector(256)
CC 1.3 : 4 registers; 24 shared, 68 constant, 0 local memory bytes;
100 occupancy
77, Generating compute capability 1.3 binary
78, Loop is parallelizable
Accelerator kernel generated
78, !$acc do parallel, vector(256)
CC 1.3 : 5 registers; 20 shared, 56 constant, 0 local memory bytes;
100 occupancy
it says that line 38 (do loop after !$acc do kernel in my code) is executed on the host while 49 (the do loop right after) is on the device. If the array ‘seeds’ is local to the device (caused by the data region directive) does the value of ‘seeds’ from the host get carried over to the device?