GPU Cluster (2 cards) faulty bandwidth time estimate

Hello everyone!

I’m trying to help port a lot of python code into CUDA so as to be able to make our analysis code run faster. We have access to a cluster of GPU cards, and I was just running through the packaged CUDA bandwidthTest, and recieved some interesting results:

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
16777216 2133333.2

This is a very nice result, but I can only assume it’s faulty. The cluster consists of a head node, and a “collection” of compute nodes. The nodes are all built around an Intel Core 2 Duo (E6850) CPU on an ASUS P5N32-E motherboard. The compute nodes each sport one NVIDIA GeForce 8800 GTX GPU. (Ultimately we want to have two GPU’s in each compute node).

As it stands, this is at least one order of magnitude greater than I think would be a believable result. Is there anyone else who has run into this problem, or anyone with any ideas for suggestions? My ultimate goal being to be able to run some sort of bandwidthTest (modified perhaps) that gives an accurate result.

Thank you!

Which driver are you using?
What’s the full output?
Do the other SDK apps work ok?

I don’t know the driver, but I will ask.

Other apps seem to work ok.

Here’s the output:

Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
16777216 1134751.9

Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
16777216 1176470.5

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
16777216 2064516.1

&&&& Test PASSED

The driver most likely is the:
NVIDIA Driver for Linux with CUDA Support (169.09)

It is definatly a Linux driver, and almost assuredly an older version.

Thank you.

Yes that driver is rather old. You need to upgrade to the CUDA_2.0 driver.