Data transfer times

Hi

I have a question concerning the data transfer times. When I transfer an array and time the results I have for the first array a longer transfer time than for the other later arrays with the same size.

eg.

! Copyin data
a_dev = a(1:2,0:100,0:100)
b_cdev = b(1:2,0:100,0:100)
c_cdev = c(1:2,0:100,0:100)

Now most time is spent for the a_dev transfer, measuring with a cpu_time call. I also tried cudaMemcpy(), same result. Is this some kind of initialization time to set up the transfer process? Any hints to avoid this are welcome.

Is this some kind of initialization time to set up the transfer process?

Is this the first time you access the device? On Linux, there is ~1 second per device initialization cost to warm-up the driver. I’ve also seen an ~0.001 t0 0.01 second overhead to initialize the device at first access.

For the driver initialization, you can run the PGI utility ‘pgcudainit’ as a background process. This will hold the driver open so it wont power down the devices and thus not need to be powered up.

Hope this helps,
Mat