Host2Device bandwidth, Kepler VS Fermi

I have compared transfering data from Host memory to Device memory between Kepler GT640 and Fermi GT530. We can get about 2GB/s on GT530 and only 1.2GB/s on GT640. Is it normal for the two cards? Does GT640 is slower than GT530 for transfering data?

on the other hand, CUDA code processing speed is much faster on GT640(of course it should be)

Both values are much lower than they should be. Are you using pinned memory? Do you have any other cards in PCI-e slots that might be competing for bandwidth?

What mainboards where these values measured in? What is their main memory bandwidth? PCIe configuration?

Thanks for reply. We found that it seems the embeded Intel graphic cards in the CPU making the transfering low. We abandon the embeded card and the GT640 can run about 2GB/s on a PCIe2. When we are using the PCIe 3.0, the GT640 can run about 5GB/s.

Thanks for reply. When we are using the PCIe 3.0, the GT640 can run about 5GB/s. The maximum should be 8GB/s? So, how do you think about this speed?

Is this running bandwidthTest with the --memory=pinned option? These numbers are still a factor of 2 low compared to most cards I’ve tried. On PCI-Express 2.0, I would expect between 4-6 GB/sec (theoretical max is 8 GB/sec) for pinned memory transfers, and on PCI-Express 3.0, there have been reports as high as 12 GB/sec (theoretical max is 16 GB/sec). All of these tests are on higher end devices, like the GTX 680, but I would not expect such a huge difference.