Tesla S1070 Bandwidth Problem

Hi All,

I’m having a problem with Host–>Device bandwidth (note - the bandwidth is not balanced - the Device–>Host BW is good!) on a Tesla S1070. Here are the specifics of my configuration:

Motherboard: MSI x48 Platinum, latest BIOS (v2.4)

RedHat 5.2 64-bit

8 GB RAM

CUDA 2.1, Driver version 180.29

Results of BW test (same for all 4 Tesla Devices):

[codebox]trio:~/sdk/bin/linux/release> ./bandwidthTest --memory=pinned --device=0

Running on…

device 0:Tesla T10 Processor

Quick Mode

Host to Device Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1195.6

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 5744.7

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 73386.1

&&&& Test PASSED

Press ENTER to exit…

[/codebox]

When running under driver 180.22 with a GTX280, I observed 5.7 GB/sec both directions, so I believe the mobo/BIOS is OK. The only configuration change that I made when setting up the Tesla was to upgrade to driver version 180.29 (latest available release for Tesla).

1200 MB/sec clearly indicates either a H/W or system configuration problem.

Any suggestions?

What do you see with 180.22 on the S1070?

Hi tmurray,

Thanks for getting back to me. I get pretty much the same - 1190 or so H->D, and around 5400 D–>H

Okay, sounds like not a driver problem. Do you have a PCI display card you can use in the same machine, or can you try the S1070 with another motherboard? I will bet that there’s a BIOS problem that causes performance regressions when you have two slots in use.

Good idea. I pulled my graphics card and one of the two Tesla interface cards, and booted in a headless configuration, then VNC’d into the box and ran from there. Same results. Incidentally, I was running successfully before with a graphics card in each slot and getting decent bandwidth numbers.

Any other ideas as to what the problem might be? I’m still dead in the water with only 1200 MB/sec H–>D transfer…

Did you verify that you’re using the latest SBIOS?

Hi Netllama,

I assume that SBIOS is not a typo for “BIOS” (I indicated in the original post that my mobo bios is updated to the latest v2.4 BIOS). I’m not familar with SBIOS - is that something specific to the Tesla? If so, how do I get new images for that and flash it?

Thanks!

Hi All,

I do have some additional information for you. Previously, when I’d checked the performance with the 180.22 driver, I had just re-run the driver installation program, and then modprobe’d the nvidia driver to load it (dmesg and /proc/drivers/nvidia/version both corroborated that it was 180.22). When I did this, I did in fact obtain the same results. However, after a full machine reboot, I obtain different results: bandwidthTest will not run at all. It exits with the following error message:

cudaSafeCall() Runtime API error in file <bandwidthTest.cu>, line 635 : unspecified launch failure.

Also, on the system monitor (using a generic PCI graphics card):

NVRM: Xid (0007:00): 6, PE0001

NVRM: Xid (0007:00): 6, PE0002

One other data point…even when the Tesla is not being exercised at all, the supplementary fans seem to oscillate from slow to fast with a period of about 30 seconds. This only appears to occur with the 180.22 driver, not the 180.29.

Hope that helps! Is 180.22 indeed supposed to work with the Tesla S1070?

Thanks again!

Hi - is anyone still following this thread? The issues have not improved (no further suggestions). Is this a fundamental problem with the Tesla S1070 hardware? Should I pursue a return/refund?

Try 177.70.33 and CUDA 2.0–the 177.70 branch is our reference for S1070 support:

http://www.nvidia.com/object/linux_display…_177.70.33.html

You could also try 180.44, which just came out today.

I am still leaning towards an SBIOS (system BIOS) problem. BR04 (the PCIe switch/bridge on the HIC and the S1070) basically makes a lot of things appear on the PCIe bus, so it’s entirely possible that such problems would disappear if you just have a couple of graphics card presents. In fact, I’d bet that if I hooked up a PCIe analyzer to the machine when the S1070 was connected you’d see that PCIe credits were being set incorrectly by the BIOS.

I don’t see anything to suggest that this is an S1070 defect. This sounds like a motherboard SBIOS (SBIOS=System BIOS) problem. Just because you’re using the latest SBIOS doesn’t mean that its not buggy.

Hi - thanks for the response. Can you suggest any manufacturer motherboards (with dual PCIE 2.0 x16 of course) that people have gotten to work successfully with the Tesla? I’d gladly order a new one and try that. Thanks!

Thanks! I’ll try 180.44 and 177.70 as well. I also posted a reply to netllama…can you suggest some motherboards that are known to work at full speed with the Tesla? That would be a very easy fix…

I don’t have much experience with consumer motherboards and S1070, sorry. I’ve had plenty of luck with C1060, but my experiences with S1070 or Quadro Plex D2 have been much more limited (basically to preconfigured workstations).

Hi tmurray,

I tried both the latest driver (180.44) as well as rolling back to the 177.70.33 (with Cuda 2.0), with no luck - both gave the same results (~1200 MB/sec H->D, ~5600 MB/sec D->H).

I agree with your assessment (more than ever) about the BIOS. In the intervening time I’ve obtained another computer with an EVGA nForce 780i FLI FTW motherboard, but that only gives PCIe 1.0 speeds (to either discrete graphics cards or to Tesla - ~3.1 GB/sec both directions). It appears that the chipset or BIOS on that board don’t really deliver on the 2.0 speeds - but clearly one or both of those did have an impact on speed, because the numbers now are at least balanced and match something that is at least recognizable (1.0).

Given that, I need more (now that before!) to find some reliable motherboards that I can just purchase and get moving (rather than fighting with motherboard/BIOS issues!). Do you have any other resources at NVIDIA there that you can access to try to find other customers that are successfully using their Teslas with motherboards at 2.0 (balanced) speeds? Please feel free to refer me to someone else within NVIDIA if that is more appropriate.

Thanks!

From personal experience, the HP DL160 and the Supermicro Twin 6015TW have x16 gen 2 slots ( 2 in the HP, 1 for each MB for the Supermicro but there are 2 MB in 1U)
and can deliver more than 5GB/s using pinned memory.