Bandwith problems with S870 and 177.67

tbrandvik · October 6, 2008, 5:28pm

Hi,

I just set up an S870 on Centos 5.0 with the 177.67 drivers and Cuda 2.0. It works fine, but I’m getting poor device-device bandwidth results. bandwidthTest from the SDK reports the following:

Running on…
device 0:Tesla C870
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1988.4

Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1739.2

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 31036.3

In contrast, the same test on a G80 on my desktop with the same OS, drivers and Cuda gives 65 GB/s. It seems like other people are having the same problem: [url=“http://forums.nvidia.com/index.php?showtopic=75817&hl=s870”]http://forums.nvidia.com/index.php?showtopic=75817&hl=s870[/url].

The overall performance of my application is about 60% on the S870 as compared to the G80 on my desktop, but the results are still correct.

I’ve attached the output from nvidia-bug-report as well. Does anyone have any ideas as to what might be going wrong?
nvidia_bug_report.log.gz (22 KB)

netllama · October 6, 2008, 5:30pm

Its not clear from your post whether you’ve performed the S870 test using the same host system as the discrete G80 test?

tbrandvik · October 6, 2008, 5:40pm

Sorry, I should have mentioned that. The S870 is running with a HP Proliant DL160 G5 with a single 2.0 GHz Xeon. The G80 test runs on my desktop which is slightly different: 2.4 GHz Core 2 Duo with an Asus P5N32-E SLI motherboard.

tbrandvik · October 6, 2008, 5:50pm

A quick update with something I just noticed. It is only devices 0 and 2 which give 31 GB/s, devices 1 and 3 report 60 GB/s:

[tb302@compute-0-0 ~]$ /share/apps/NVIDIA_CUDA_SDK/bin/linux/release/bandwidthTest --device=0

Running on…

  device 0:Tesla C870

Quick Mode

Host to Device Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1989.4

Quick Mode

Device to Host Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1737.7

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 31030.3

&&&& Test PASSED

Press ENTER to exit…

[tb302@compute-0-0 ~]$ /share/apps/NVIDIA_CUDA_SDK/bin/linux/release/bandwidthTest --device=1

Running on…

  device 1:Tesla C870

Quick Mode

Host to Device Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 2069.6

Quick Mode

Device to Host Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1749.4

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 60468.6

&&&& Test PASSED

Press ENTER to exit…

[tb302@compute-0-0 ~]$ /share/apps/NVIDIA_CUDA_SDK/bin/linux/release/bandwidthTest --device=2

Running on…

  device 2:Tesla C870

Quick Mode

Host to Device Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1992.2

Quick Mode

Device to Host Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1737.5

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 31016.8

&&&& Test PASSED

Press ENTER to exit…

[tb302@compute-0-0 ~]$ /share/apps/NVIDIA_CUDA_SDK/bin/linux/release/bandwidthTest --device=3

Running on…

  device 3:Tesla C870

Quick Mode

Host to Device Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 2069.9

Quick Mode

Device to Host Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1749.7

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 60468.6

&&&& Test PASSED

Press ENTER to exit…

Vivek · October 8, 2008, 11:25am

I’m seeing something similar on a QuadroPlex Model IV with driver 177.80. GPU 0 reports 30GB/s while GPU1 reports 60GB/s.

Any comments from Nvidia on this?

jma · October 8, 2008, 11:53am

Perhaps unrelated, but with 177.80 and bandwidthTest, I got reduced bandwidth every second time I ran it as in: Reduced on first try, full at second try, reduced at third, full at fourth … etc.

This does not happen with 177.73

netllama · October 8, 2008, 3:57pm

177.73 is the latest CUDA qualified & tested driver.

MisterAnderson42 · October 8, 2008, 4:45pm

Check your device clocks with deviceQuery. I bet they are decreased from what they should be.

I have the same problem on a D870 with drivers 177.67.
Well sort of. GPU 0 is down clocked and GPU 1 isn’t in the D870. I’ve got a bug on file with NVIDIA, but nothing has come of it yet.

It seems that the drivers are deciding since the device isn’t attached to a display and isn’t doing anything useful, it should be down clocked to save power :)

I reverted to the CUDA 2.0 beta which works fine until the problem is solved. I haven’t tried any newer versions yet as I have yet to receive a message saying that the bug is closed.

Vivek · October 9, 2008, 2:43am

When you say that CUDA 2.0 beta works fine, do you mean that you’re using the older driver?

MisterAnderson42 · October 9, 2008, 1:36pm

Yes. I’m running CUDA 2.0beta2 and the corresponding driver 177.13.

jimh · November 3, 2008, 5:34pm

I’m seeing the same problem with an S870, CUDA 2.0, 177.73, and a 680i motherboard (P6N Diamond). Devices 0 and 2 are clocked at 1.19GHz and device-device bandwith is half compared to devices 1 and 3. Bandwidth is around 30GB/s on devices 0 and 2 and 60+ GB/s on devices 1 and 3.

I also have the problem on a setup with a D870, CUDA 2.0, 177.73, and an Intel X38 motherboard (DX38BT). Device 0 is clocked at 1.19GHz and has half the bandwidth of device 1.

The problem does not show up on either system when using CUDA 1.1. I haven’t tried 2.0 beta2.

MisterAnderson42 · November 4, 2008, 1:07pm

Well, CUDA 2.1 is just around the corner so maybe they completely ignored this problem for 2.0 to fix it in 2.1 fingers crossed. It seems a shame to me that a majority of the original Tesla line cannot be used with CUDA 2.0 in a production setup, despite the problem being reported from day one. Stupid if you ask me.

If this persists with 2.1, you can be sure that I’ll be making a lot more noise about it.

tmurray · November 6, 2008, 2:49am

Have you tried the S1070 driver (I know, seems weird)? It’s in our bug database as fixed and should be in the S1070 driver (177.70.18 or whatever), but for whatever reason it’s apparently not in 177.73 or 177.80 as far as I can tell. I’ve been told that 2.1 drivers will definitely contain the fix, though.

MisterAnderson42 · November 6, 2008, 2:30pm

Odd, it doesn’t show as fixed in my bug view. Maybe it it is tagged that way in the internal system. That, and browsing through I noticed at least one other duplicate (though for 177.73 where the bug I posted mentioned the previous driver version).

Thanks for the info on 177.70.18. It works like a charm on the D870 I’ve got here.

tmurray · November 6, 2008, 5:42pm

Yours is marked as a duplicate of the S870 bug, and I don’t know why no one told you that… anyway, glad to hear that it works on a D870. That fix is definitely in the 2.1 beta driver.

laffer · January 24, 2009, 10:19pm

I’ve just discovered this problem with my Tesla S870 and the 177.82 driver. Two of the GPUs are attached to one IBM x3755 and the other two are attached to a different x3755. One x3755 also has a Quadro FX 5800 and the other a 5600. For each x3755, one of the Tesla GPUs clocks at 1.35 GHz and the other at 1.19 GHz. The slower-clocked GPU reports a device-to-device bandwidth that is around half what it should be (only ~30 GHz instead of ~60).

When will there be a fix?

tmurray · January 25, 2009, 1:20am

Both 177.70.18 and 180.22 contain the bandwidth fix. (Well, 177.70.18 definitely does, I haven’t tried 180.22, but The Powers That Be tell me that it does)

laffer · January 26, 2009, 3:48pm

Thanks. 180.22 didn’t work, but 177.70.18 does.

tmurray · January 26, 2009, 7:38pm

Thanks for that–I am making inquiries.

Topic		Replies	Views
device memory bandwidth issues with 177.67 lower then expected CUDA Programming and Performance	7	5497	October 5, 2008
Problem with kernel 174.55 and Tesla C870 on RH5 Testing the 2.0 beta drivers CUDA Programming and Performance	7	5738	June 26, 2008
CUDA 2.0b & C870 CUDA Programming and Performance	2	5712	May 9, 2008
Tesla S1070 Bandwidth Problem CUDA Programming and Performance	16	11566	March 31, 2009
Performance drops down with 177.84 driver Old 174.55 driver has better performance CUDA Programming and Performance	22	17637	September 1, 2008
Correct memory bandwidths for the Tesla 870? CUDA Programming and Performance	3	4055	February 21, 2008
GPU Cluster (2 cards) faulty bandwidth time estimate CUDA Programming and Performance	4	2417	November 14, 2008
bandwidth test CUDA Programming and Performance	9	19486	March 24, 2009
Extremely poor bandwidth performance, any hints ? CUDA Programming and Performance	1	703	April 8, 2011
Extremely low bandwidth CUDA Programming and Performance	10	2129	September 4, 2010

Bandwith problems with S870 and 177.67

Related topics