Data transfer times

thomi · August 25, 2010, 4:21pm

Hi

I have a question concerning the data transfer times. When I transfer an array and time the results I have for the first array a longer transfer time than for the other later arrays with the same size.

eg.

! Copyin data
a_dev = a(1:2,0:100,0:100)
b_cdev = b(1:2,0:100,0:100)
c_cdev = c(1:2,0:100,0:100)

Now most time is spent for the a_dev transfer, measuring with a cpu_time call. I also tried cudaMemcpy(), same result. Is this some kind of initialization time to set up the transfer process? Any hints to avoid this are welcome.

MatColgrove · August 25, 2010, 4:34pm

Is this some kind of initialization time to set up the transfer process?

Is this the first time you access the device? On Linux, there is ~1 second per device initialization cost to warm-up the driver. I’ve also seen an ~0.001 t0 0.01 second overhead to initialize the device at first access.

For the driver initialization, you can run the PGI utility ‘pgcudainit’ as a background process. This will hold the driver open so it wont power down the devices and thus not need to be powered up.

Hope this helps,
Mat