I have a question concerning the data transfer times. When I transfer an array and time the results I have for the first array a longer transfer time than for the other later arrays with the same size.
! Copyin data
a_dev = a(1:2,0:100,0:100)
b_cdev = b(1:2,0:100,0:100)
c_cdev = c(1:2,0:100,0:100)
Now most time is spent for the a_dev transfer, measuring with a cpu_time call. I also tried cudaMemcpy(), same result. Is this some kind of initialization time to set up the transfer process? Any hints to avoid this are welcome.