Memory Read and Write to device gives different timing

I was trying a simple experiment using CUDA where I am copying data(image of size 4096*4096) to and from GPU device.

The read time from device takes more time as compared to writing the data to device of same size.

The code is as follows

CUT_SAFE_CALL( cutStartTimer( memoryImageTimerPS));

	for(int i=0; i < 6;i++)



		cudaHostToDeviceMemcpyWrapper(d_iData,(pixelData),size); // copy host to device



	CUT_SAFE_CALL( cutStopTimer( memoryImageTimerPS));

	CUT_SAFE_CALL( cutStartTimer( memoryTransferPS));

	for(int i=0; i < 6;i++)



		cudaDeviceToHostMemcpyWrapper((pixelData),d_oData,size); // copy device to host



	CUT_SAFE_CALL( cutStopTimer( memoryTransferPS));

I am not able to get the reason why is it happening. This difference goes on increasing as i increase the data size.

Please let me know if anybody has an idea on why it happens.

How much of a difference are you seeing? Depending on your motherboard and other factors transfer rates may not always be symmetrical.

Looks like you u get different transfer speeds between writing to and reading from device via PCIe. I have seen the same effect on one of my mashines (6 GB/s vs. 4 GB/s).

The device write is almost twice as fast as read.