Hi all,
With this OpenACC code
#define NUM_T 73728
#define NUM_H 64
char *in = calloc(NUM_T * NUM_H * 144, sizeof(char));
#pragma acc enter data copyin(in[0:NUM_T * NUM_H * 144])
#pragma acc parallel loop independent vector_length(NUM_H) present(in)
for (unsigned int t = 0; t < NUM_T; t++)
{
#pragma acc loop independent
for (unsigned int h = 0; h < NUM_H; h++)
{
// Do some stuff ...
}
}
I got
Accelerator Kernel Timing data
(unknown)
(unknown) NVIDIA devicenum=0
time(us): 18
0: upload reached 1 time
0: data copyin transfers: 1
device time(us): total=18 max=18 min=18 avg=18
D:\Developpement\OpenACC\TestACC\main.c
main NVIDIA devicenum=0
time(us): 489,931
76: data region reached 1 time
76: data copyin transfers: 41
device time(us): total=489,931 max=12,114 min=6,060 avg=11,949
77: compute region reached 1 time
77: kernel launched 1 time
grid: [65535] block: [64]
elapsed time(us): total=23,000 max=23,000 min=23,000 avg=23,000
77: data region reached 2 times
Why 41 data copyin transfers? no just one ?
Thanks for any help with this.