For quite some time now I sensed that the data transfers from host to device and back are way too slow and that the bottleneck is really problematic for me.
I’m using GeForce 8600 GT on HP Compaq 6100 MT, in a PCI Express slot.
If I understand it correctly then the max speed of the PCI Express (which the GPU should be using to the fullest) is 4GB/s.
I’m taking this bit of information from here:
[url=“http://www.nvidia.com/object/geforce_8600_features.html”]http://www.nvidia.com/object/geforce_8600_features.html[/url]
However, when I run bandwidthTest.exe from the SDK I get host-to-device bandwidth of ~600-650 MB/s and device to host of ~750-800 MB/s. These transfer rates seem compatible with the results I’ve measured in other CUDA code I wrote in the past.
My question is, is this normal? These speeds are almost 1/8 of the max potential (if I understand the max potential correctly). Is this a faulty card or motherboard?
Some mainboards don’t even have 16 PCI express lanes. For example my Asrocck 4CoreDual SATA2 only has 4 lanes connected. It’s a limitation of the chipset. Before you suspect a bug, find out about your mainboard specs.
I’ve been having the same transfer rates on my 8600GT and now on 8800GTS. That is if I measure with pageable memory. Have you tried running bandwidth.exe with -memory=pinned (if I remember correctly)? In my case I was getting 2-2.5 GB/s with pinned memory. You might also try increasing the data size from the default 32MB to 64 or even more.
My motheboard is a Gigabyte GA-K8N51GMF-9-RH (pretty old Socket 939 board based on NVIDIA GeForce 6100 chipset).
Also the real max potential you might expect (ie. practical) is around 2.5 GB/s, up to 3GB/s with a really good mobo. 4GB/s is just for marketing.
If you’re getting the same speeds with pinned and pageable, it’s almost certainly a motherboard problem. I’ve never seen pinned be less than 1.5x pageable.