The speed of data transfer between GPU and CPU

xiaolaji · April 23, 2009, 2:25am

In my memory, PCIE 1.016 can achieve 4GB/s transferring data between GPU and CPU.
But in my experiment, It spend 0.322ms to transfer a 368288 image(32 bit one pixel, 423936Bytes) from GPU to CPU, which means that the speed is only 1.3GB/s.
I allocate the GPU memory using the function cudaMalloc, and use the following code to calculate the speed.
int num =1000;

CUDA_SAFE_CALL( cudaThreadSynchronize() );
CUT_SAFE_CALL( cutResetTimer(hTimer) );
CUT_SAFE_CALL( cutStartTimer(hTimer) );

for(int i = 0; i < num; i ++)
{
CUDA_SAFE_CALL( cudaMemcpy(d_Data, h_Data, DATA_SIZE, cudaMemcpyHostToDevice); //DATA_SIZE = 368 * 288 * sizeof(float);
}

CUDA_SAFE_CALL( cudaThreadSynchronize() );
CUT_SAFE_CALL( cutStopTimer(hTimer) );
gpuTime = cutGetTimerValue(hTimer) * 1.0 / num;
printf(“…data transfer() time: %f msecs; \n”, gpuTime);

So I am a little confused.

Could someone give me some advices?

Thank you very much!

Demq · April 23, 2009, 4:01am

Hi,

I think it might be caused if you use pageable host memory instead of pinned (which is allocated using cuadaMallocHost()). Also your memory transfer size is not very optimal.

A quick way to check this is to run the bandwidthTest in two modes and look for the memory transfer closest to yours:
bandwidthTest --mode=shmoo --memory=pinned
bandwidthTest --mode=shmoo --memory=pageable

With pageable memory I am getting 2897.8 MB/s for 512k bytes, but ~ 5000 with pinned on PCI-E 2.0 x16

Cheers!

Pimbolie1979 · April 23, 2009, 12:53pm

I have the same results like Demg.

But how can I use pinned memory with cudaMallocHost()?

Demq · April 23, 2009, 4:55pm

Just look up the function in the manual, that’s how you allocate pinned memory on the host instead of using malloc() that allocates pageable memory.

xiaolaji · April 27, 2009, 7:28am

Thanks for your help! :rolleyes:

Topic		Replies	Views
About Data transfer speed between CPU and GPU? How to increase the data transfer speed? CUDA Programming and Performance	7	15577	December 11, 2009
CudaMemcpy() speed/bandwidth For host to device CUDA Programming and Performance	5	10026	June 30, 2009
Data transfer speed between G80 and main memory CUDA Programming and Performance	17	12346	January 26, 2008
Memory copy improvement ? CUDA Programming and Performance	6	3119	April 25, 2012
What factors effect GPU transfer speed? CUDA Programming and Performance	7	9174	September 15, 2009
Optimize data transfer rate from host to device CUDA Programming and Performance	3	2834	July 27, 2017
Bad PCIe transfer performance (cudaMemcpy), what can cause that? CUDA Programming and Performance	10	11606	September 20, 2010
Data Tansfer Speed From Host to GPU card CUDA Programming and Performance	1	2597	February 9, 2009
bandwidthTest anomaly! CUDA Programming and Performance	4	10891	July 31, 2009
Memory copy speed CUDA Programming and Performance	3	4435	April 2, 2009

The speed of data transfer between GPU and CPU

A quick way to check this is to run the bandwidthTest in two modes and look for the memory transfer closest to yours: bandwidthTest --mode=shmoo --memory=pinned bandwidthTest --mode=shmoo --memory=pageable

Related topics

A quick way to check this is to run the bandwidthTest in two modes and look for the memory transfer closest to yours:
bandwidthTest --mode=shmoo --memory=pinned
bandwidthTest --mode=shmoo --memory=pageable