Hi everyone,
I have a question about a loop I placed inside a kernel:
!Bits of code:
do i = 1, runs
PHIN(x + 1, y + 1 , z ) = AN(x + 1, y + 1, z ) * PHI(x + 1, y + 2, z)&
+AS(x + 1, y + 1, z ) * PHI(x + 1, y , z)&
+AE(x + 1, y + 1, z ) * PHI(x + 2, y + 1, z)&
+AW(x + 1, y + 1, z ) * PHI(x, y + 1, z)&
+AP(x + 1, y + 1, z ) * PHI(x + 1, y + 1, z)
end do
! This is how I copied the device data back to the cpu after the execution of the kernel:
PHIN = dev_PHIN
I noticed that it took longer for the gpu to copy values to the cpu. For example, if runs = 2, it would take twice as long to return data as if PHIN copied twice to the cpu. I do not understand why that is the case because I was only asking the kernel to perform the calculation twice, not return it twice. If someone could please explain why that is the case I would really appreciate it.
Thankfully,
Chris