I’m trying to help port a lot of python code into CUDA so as to be able to make our analysis code run faster. We have access to a cluster of GPU cards, and I was just running through the packaged CUDA bandwidthTest, and recieved some interesting results:
Device to Device Bandwidth
Transfer Size (Bytes) Bandwidth(MB/s)
This is a very nice result, but I can only assume it’s faulty. The cluster consists of a head node, and a “collection” of compute nodes. The nodes are all built around an Intel Core 2 Duo (E6850) CPU on an ASUS P5N32-E motherboard. The compute nodes each sport one NVIDIA GeForce 8800 GTX GPU. (Ultimately we want to have two GPU’s in each compute node).
As it stands, this is at least one order of magnitude greater than I think would be a believable result. Is there anyone else who has run into this problem, or anyone with any ideas for suggestions? My ultimate goal being to be able to run some sort of bandwidthTest (modified perhaps) that gives an accurate result.