I have a question about a loop I placed inside a kernel:
!Bits of code: do i = 1, runs PHIN(x + 1, y + 1 , z ) = AN(x + 1, y + 1, z ) * PHI(x + 1, y + 2, z)& +AS(x + 1, y + 1, z ) * PHI(x + 1, y , z)& +AE(x + 1, y + 1, z ) * PHI(x + 2, y + 1, z)& +AW(x + 1, y + 1, z ) * PHI(x, y + 1, z)& +AP(x + 1, y + 1, z ) * PHI(x + 1, y + 1, z) end do ! This is how I copied the device data back to the cpu after the execution of the kernel: PHIN = dev_PHIN
I noticed that it took longer for the gpu to copy values to the cpu. For example, if runs = 2, it would take twice as long to return data as if PHIN copied twice to the cpu. I do not understand why that is the case because I was only asking the kernel to perform the calculation twice, not return it twice. If someone could please explain why that is the case I would really appreciate it.