Hello,
does anybody have a FORTRAN code example for an “up-sweep” of an array that is optimal for the PGI Accelerator Programming Model?
1, 2, 3, 4, 5, 6, 7, 8, 9, …
1, 1+2, 3, 3+4, 5, 5+6, 7, 7+8, 9,…
…
Thank you!
Hello,
does anybody have a FORTRAN code example for an “up-sweep” of an array that is optimal for the PGI Accelerator Programming Model?
1, 2, 3, 4, 5, 6, 7, 8, 9, …
1, 1+2, 3, 3+4, 5, 5+6, 7, 7+8, 9,…
…
Thank you!
Hi elephant,
I don’t, but maybe someone else does.
Do you have an non-accelerator example source (that’s parallel) you could post? Granted, I’m not sure upsweep will work well on a GPU, but having something to start from would be helpful.
Hi,
The example source for an up-sweep would look something like this:
do d=0,int(log(dble(N))/log(2.0)-1)
do i=0,N-1,(2**(d+1))
T(i+2**(d+1)-1)=T(i+2**d-1)+T(i+2**(d+1)-1)
end do
end do
Where T is the 1-dim array of size N which entries has to be summed up.
But since I only have to sum up every 8 neighboring entries (sum(1:8), sum(9:16),… ) and not the whole array, I rearanged N to another array Q of size (N/8,8) and wrote a code like the following and got very good speed-up!!!
!$acc region
do i=1,N/8
Qtemp1(i) = Q(i,1)+Q(i,2)+Q(i,3)+Q(i,4) &
+Q(i,5)+Q(i,6)+Q(i,7)+Q(i,8)
end do
!$acc end region
So, problem solved.
Thank you!