I just set up an S870 on Centos 5.0 with the 177.67 drivers and Cuda 2.0. It works fine, but I’m getting poor device-device bandwidth results. bandwidthTest from the SDK reports the following:
Running on…
device 0:Tesla C870
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1988.4
Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1739.2
Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 31036.3
Sorry, I should have mentioned that. The S870 is running with a HP Proliant DL160 G5 with a single 2.0 GHz Xeon. The G80 test runs on my desktop which is slightly different: 2.4 GHz Core 2 Duo with an Asus P5N32-E SLI motherboard.
Perhaps unrelated, but with 177.80 and bandwidthTest, I got reduced bandwidth every second time I ran it as in: Reduced on first try, full at second try, reduced at third, full at fourth … etc.
Check your device clocks with deviceQuery. I bet they are decreased from what they should be.
I have the same problem on a D870 with drivers 177.67.
Well sort of. GPU 0 is down clocked and GPU 1 isn’t in the D870. I’ve got a bug on file with NVIDIA, but nothing has come of it yet.
It seems that the drivers are deciding since the device isn’t attached to a display and isn’t doing anything useful, it should be down clocked to save power :)
I reverted to the CUDA 2.0 beta which works fine until the problem is solved. I haven’t tried any newer versions yet as I have yet to receive a message saying that the bug is closed.
I’m seeing the same problem with an S870, CUDA 2.0, 177.73, and a 680i motherboard (P6N Diamond). Devices 0 and 2 are clocked at 1.19GHz and device-device bandwith is half compared to devices 1 and 3. Bandwidth is around 30GB/s on devices 0 and 2 and 60+ GB/s on devices 1 and 3.
I also have the problem on a setup with a D870, CUDA 2.0, 177.73, and an Intel X38 motherboard (DX38BT). Device 0 is clocked at 1.19GHz and has half the bandwidth of device 1.
The problem does not show up on either system when using CUDA 1.1. I haven’t tried 2.0 beta2.
Well, CUDA 2.1 is just around the corner so maybe they completely ignored this problem for 2.0 to fix it in 2.1 fingers crossed. It seems a shame to me that a majority of the original Tesla line cannot be used with CUDA 2.0 in a production setup, despite the problem being reported from day one. Stupid if you ask me.
If this persists with 2.1, you can be sure that I’ll be making a lot more noise about it.
Have you tried the S1070 driver (I know, seems weird)? It’s in our bug database as fixed and should be in the S1070 driver (177.70.18 or whatever), but for whatever reason it’s apparently not in 177.73 or 177.80 as far as I can tell. I’ve been told that 2.1 drivers will definitely contain the fix, though.
Odd, it doesn’t show as fixed in my bug view. Maybe it it is tagged that way in the internal system. That, and browsing through I noticed at least one other duplicate (though for 177.73 where the bug I posted mentioned the previous driver version).
Thanks for the info on 177.70.18. It works like a charm on the D870 I’ve got here.
Yours is marked as a duplicate of the S870 bug, and I don’t know why no one told you that… anyway, glad to hear that it works on a D870. That fix is definitely in the 2.1 beta driver.
I’ve just discovered this problem with my Tesla S870 and the 177.82 driver. Two of the GPUs are attached to one IBM x3755 and the other two are attached to a different x3755. One x3755 also has a Quadro FX 5800 and the other a 5600. For each x3755, one of the Tesla GPUs clocks at 1.35 GHz and the other at 1.19 GHz. The slower-clocked GPU reports a device-to-device bandwidth that is around half what it should be (only ~30 GHz instead of ~60).
Both 177.70.18 and 180.22 contain the bandwidth fix. (Well, 177.70.18 definitely does, I haven’t tried 180.22, but The Powers That Be tell me that it does)