PCI Express x 16 is said to have 4 GB/s of peak bandwidth per direction, and up to 8 GB/s concurrent bandwidth. I am curious to know if CUDA provides a mechanism to use this concurrent bandwidth of 8 GB/s for cards that support PCIe x 16.
PCI Express x16 v1.1. has a theoretical bandwidth of max. 4GB/s in each direction, but the bandwidth you’ll get depends on your mainboard chipset and the architecture.
The highest bandwidth I have experienced was around 3,5 GB/s with pinned memory.
PCI Express x16 v2.0. supports a max. bandwidth of max. 8GB/s, but theoretical of course.
The only consumer cards I know that support PCI Express x16 v2.0 are the GeForce 8800 GT(G92) and the GeForce 8800GTS 512MB (G92).
I have no experiences with these cards and their actual bandwidth in CUDA so I can’t comment on that.
Try running the BandwidthTest Sample from the CUDA SDK to see what you can expect with your card.
What about in an ideal configuration: would
bandwidthTest --memory=pinned
run at 4 GB/s or 8 GB/s? And if it’s 4 GB/s is there (another) CUDA example where the bandwidth whould be above 4 GB/s?
you will never get the full 4GB/s in “real life”. There are always some constraints in the hardware of your system.
Only if you use PCI Express 2.0 which doubles the theoretical bandwidth of 1.1 assuming you’re also using a graphics card with PCI Express 2.0 support.
Maybe you can get up to 7GB/sec then.
The bandwidthTest from the SDK reflects only what you can expect from your system.
See [url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA for a precise answer why ~3.4GiB/s is the peak. You are making full use of unoverclocked hardware if you get this performance.
P.S. to anyone with a PCIe-2.0 MB and a PCIe-2.0 capable card, please post the output of bandwidthTest “–memory=pinned -mode=shmoo”. I’m sure many of us would love to see how close you can get to the 8GiB/s peak… The actual bandwidth available to the RAM on the MB may become the limiting factor now.