Memory copy speed

I met some problem of memory copy speed. But I don’t know why?
I have two pcs whose setup are as:
CPU:Xeon 3.4G*2
VGA:GTX 285(compute capability 1.3)
CUDA version:2.2Beta
Bandwidth calculated by cuda example
Host to Device:920MB/Sec
Device to Host:872MB/Sec

CPU:Core2 2.4G Q core
VGA:9800GTX(compute capability 1.1)
CUDA version:2.0
Bandwidth calculated by cuda example
Host to Device:1766MB/Sec
Device to Host:1433MB/Sec

My code is to copy a 81925000 unsigned short data from host to device.
and then copy 2 8192
5000 char data from device to host.

the transfter time is very strange:

Why?I think the copy time depend on bandwidth, but it seems not to be like what I said.
Why?any thing I missed?

you are transfering very small packages of 40MB*sizeof(unsigned short). May be time measurement is not prcise enough due to coarse time steps and latency effects of the memory transfer.

Also, unless you’re using pinned memory (doubtful given the figures from the CUDA SDK bandwidthTest), you’re also timing the CPU’s memory subsystem, since the CUDA runtime has to copy data into its own pinned memory buffers prior to the PCIe transfer. FWIW, on my machine, the latency of a PCIe transfer from pinned memory is about 10 microseconds.

My PC is a:
CPU = Core i7-920 (Quadcore with 2.8GHz)
Memory = Tripple Channcel DDR3-1600MHz
VGA-Card =NVIDIA 9800GTX+ with 512MB Memory

I use Windows XP and Cuda 2.1

I had test the bandwith with the CUDA bandwithexample (“NVIDIA Corporation\NVIDIA CUDA SDK\bin\win32\Release\bandwidthTest.exe”)

My Results are:
5200MByte/second from PC to GPU
4700MByte/second from GPU to PC

Now I will test a NVIDIA 285GTX.

Can anybodypost your results?