Hi,
When I set PGI_ACC_TIME=1 in my MPI OpenACC application, I get a dump like this.
time(us): 44,660,964
22: compute region reached 148500 times
22: data copyin transfers: 148500
device time(us): total=16,976,751 max=1,409 min=4 avg=114
22: kernel launched 148500 times
grid: [92] block: [256]
device time(us): total=21,528,393 max=152 min=144 avg=144
elapsed time(us): total=90,369,207 max=67,390 min=168 avg=608
22: data copyout transfers: 148500
device time(us): total=3,537,741 max=1,012 min=12 avg=23
22: data region reached 297000 times
22: data copyin transfers: 148500
device time(us): total=2,164,391 max=1,038 min=5 avg=14
The parallel region is
22: #pragma acc parallel loop present(…) copyin(…)
Could you please let me know what is the data copyin and data copyout that is dumped within the compute region (shown in bold font above). I do not do any copyout in the parallel region. Any copyin I do in the parallel construct I believe, is shown in the data region, data copyin (the last three lines of the dump)
thanks,
Naga