I met some problem of memory copy speed. But I don’t know why?
I have two pcs whose setup are as:
PC1(workstation):
CPU:Xeon 3.4G*2
VGA:GTX 285(compute capability 1.3)
CUDA version:2.2Beta
Bandwidth calculated by cuda example
Host to Device:920MB/Sec
Device to Host:872MB/Sec
PC2(IPC):
CPU:Core2 2.4G Q core
VGA:9800GTX(compute capability 1.1)
CUDA version:2.0
Bandwidth calculated by cuda example
Host to Device:1766MB/Sec
Device to Host:1433MB/Sec
My code is to copy a 81925000 unsigned short data from host to device.
and then copy 2 81925000 char data from device to host.
the transfter time is very strange:
PC1:63ms
PC2:74ms
Why?I think the copy time depend on bandwidth, but it seems not to be like what I said.
Why?any thing I missed?
Hi,
you are transfering very small packages of 40MB*sizeof(unsigned short). May be time measurement is not prcise enough due to coarse time steps and latency effects of the memory transfer.
Also, unless you’re using pinned memory (doubtful given the figures from the CUDA SDK bandwidthTest), you’re also timing the CPU’s memory subsystem, since the CUDA runtime has to copy data into its own pinned memory buffers prior to the PCIe transfer. FWIW, on my machine, the latency of a PCIe transfer from pinned memory is about 10 microseconds.