Problem with kernel 174.55 and Tesla C870 on RH5 Testing the 2.0 beta drivers

Hi,

I’m trying to get the new 2.0 beta kernel drivers (174.55) to work with the Tesla C870. Previous drivers (such as 169.09) appear to work fine with the same C870 hardware. Also unusual is that the beta driver (174.55) does work with a Quadro NVS 290 card (also installed on the machine); it’s just the C870 card that has the problem.

Actually, to be more specific, I think the C870 might work with the beta drivers, but just be extremely slow. For example, the bandwidth test gives Device-Device copies of 200 MB/s, whereas I was expecting ~64 GB/s.

I’ve included some more information about my system. Thanks in advance for your help.


Linux version 2.6.18-53.1.14.el5 (mockbuild@builder6.centos.org) (gcc version 4.
1.2 20070626 (Red Hat 4.1.2-14)) #1 SMP Wed Mar 5 11:36:49 EST 2008
CentOS release 5 (Final)

kernel

NVIDIA-Linux-x86-174.55-pkg1.run


./bin/linux/release/bandwidthTest
Using device 1: Tesla C870

Device to Device Bandwidth
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 199.6

&&&& Test PASSED

I have been having similar problems too. With a Quadro NVS 290 and a Tesla C870 on a dual socket 8-core Xeon system, the bandwidth test shows only about 4.4 GB/s of device-to-device bandwidth. This is much lower than the 64 GB/s I get with my 8800 GTX on Core2 Duo.

I have even tried replacing the Tesla with 8800 Ultra and the device-to-device is even lower - about 200 MB/s. Is this a problem with the drivers or the hardware? I have tried both CUDA 1.1 and 2.0 and it is the same with both versions.

@gopher
This is most likely a kernel or BIOS problem. Have you verified that you’re using the latest motherboard BIOS? What kind of motherboard are you using?

Running dmidecode gives me the following information. I have removed most of the extraneous info from the output. The system is a Dell workstation with dual socket Xeon processors.

SMBIOS 2.5 present.

123 structures occupying 4842 bytes.

Table at 0x000F0450.

BIOS Information

    Vendor: Dell Inc.

    Version: A01

    Release Date: 01/31/2008

    Address: 0xF0000

    Runtime Size: 64 kB

    ROM Size: 1024 kB

    Characteristics:

            PCI is supported

            PNP is supported

            APM is supported

            BIOS is upgradeable

            BIOS shadowing is allowed

            ESCD support is available

            Boot from CD is supported

            Selectable boot is supported

            EDD is supported

            Japanese floppy for Toshiba 1.2 MB is supported (int 13h)

            3.5"/720 KB floppy services are supported (int 13h)

            Print screen service is supported (int 5h)

            8042 keyboard services are supported (int 9h)

            Serial services are supported (int 14h)

            Printer services are supported (int 17h)

            ACPI is supported

            USB legacy is supported

            BIOS boot specification is supported

            Function key-initiated network boot is supported

            Targeted content distribution is supported

    BIOS Revision: 0.0

Handle 0x0100, DMI type 1, 27 bytes.

System Information

    Manufacturer: Dell Inc.

    Product Name: Precision WorkStation T7400

    Version: Not Specified

    Serial Number: xxxxxxxxxx

    UUID: xxxxxxxxxxxxxxxxxx

    Wake-up Type: Power Switch

    SKU Number: Not Specified

    Family: Not Specified

Handle 0x0200, DMI type 2, 8 bytes.

Base Board Information

    Manufacturer: Dell Inc.

    Product Name: 0RW199

    Version:

    Serial Number: xxxxxxxxxx

Processor Information

    Socket Designation: CPU

    Type: Central Processor

    Family: Xeon

    Manufacturer: Intel

Is there something wrong with this config?

Have you verified that you’re using the latest motherboard BIOS?
Are you actually using an x16 PCI-E slot?

I was using a x16 slot, but the BIOS version was old. I updated that and things are ok now. I get about 64 GB/s bandwidth.
Thanks.

I am having a similar problem. I have two Tesla C870 running in Linux-x86_64 Fedora 8.
With previous drivers 169.09 and Cuda 1.1 I had 64000 MB/s device to device bandwidth, but after upgrading to Cuda beta 2.0 with driver 177.13 I have only 57091 MB/s device to device.

Like gopher I have dual socket 8-core Xeon ( dell Precision WorkStation T7400). I updated Bios to latest version but that didnt change anything.
Is anyone experiencing something similar ?

I am experiencing a similar problem with the beta driver 177.13. Also on the Precision T7400.

Edit 6/30:

I tried upgrading the BIOS from A01 to A02 with no luck. I tried updating the kernel, from 2.6.18-53.1.14.el5 to 2.6.18-92.1.6.el5, also with no luck. I believe this is the 32 bit kernel.

My bandwidth numbers are very similar to Kravell’s.

Kip