I’m porting an existing CFD code OpenACC and am trying to optimize the data transfers a bit.
I noticed that with PGI_ACC_TIME=1 is enabled, I get the following breakdown:
27: data region reached 11 times
27: data copyin reached 440 times
device time(us): total=1,216,222 max=2,815 min=1,992 avg=2,764
84: data copyout reached 77 times
device time(us): total=186,841 max=2,701 min=1,467 avg=2,426
My data region statement is:
*$acc data pcopyin(x, u) pcopyout(xmu)
where x, u, and xmu are multdimensional (5-d) arrays in Fortran. I expected the copyin information to occur 22 times (perhaps 33 to allow for the alloc) and the copyout to occur 11 since I’m calling the routine 11 times for the benchmark.
Any hints as to why the # of copy’s aren’t matching this expectation?
I also tried to replace the data region with explicit acc_copyin(), acc_create(), acc_delete() and acc_copyout() statements (along with present clause to the kernel) but that’s not working properly even though the # of transfers is 3 – that’s for another post though.
Thanks for any guidance here.