GTX 470 Memory Bandwidth Issue

Hello everyone,

We just installed GeForce GTX 470 on a machine with Core2Duo 2.33GHz and 2.5GB of RAM.

NSight plugin is installed on VS08. Now I try to do a simple job of moving 768MB (I tried different sizes too) to the GPU and then back to host. With the NSight Memory profiler, I really get shocking results. H2D peak bandwidth of just around 1.7GB/s and around 1.1GB/s for copying back to host memory. There is no kernel call in the code just memory copies.

Theoretically we can achieve around 65GB/s peak bandwidth (320bits/cycle, 1674M cycles/sec). Is it just me or am doing something wrong?

I have attached the screen shot of profiler results and also the simple code.
cuda_prof.png

You’ve confused device<>host bandwidth (ie. PCIE slot bandwidth) and device memory bandwidth (GPU’s RAM to GPU’s ALUs).

Copying data from host to device and back will be that slow (you can speed it up a bit by using pinned host memory). Accesses to device memory from the kernel will be fast.

You mean 6Gb/s - that is as fast as PCI-e 2.0 bus can possibly go. The GTX470 has a global memory bandwidth of about 133 Gb/s, but device-host copies are limited by the speed of the PCI-e bus.

My bad! Yes I realized that. Memory transfers seems to be a large overhead then!