S1070 with Ubuntu 8.04

Dear CUDA people,

my TESLA S1070 connected to a HP ProLiant 785 G5 server with Ubuntu linux 8.04 installed is making trouble.

After reboot, no nvidia device is present in /dev/.

I tried different drivers (177.70.11, 177.70.18, etc). After trying you little script of cuda 1.1 release notes, I have all devices in /dev/, but deviceQuery is very slow (right information after 5 min.). furthermore, I get kernel logs:

Nov 19 17:03:25 giselle kernel: [ 1026.903369] warning: process `nvidia-installe’ used the deprecated sysctl system call with 1.23.

Nov 19 17:04:27 giselle kernel: [ 1088.330866] PCI: Setting latency timer of device 0000:87:00.0 to 64

Nov 19 17:04:27 giselle kernel: [ 1088.331025] PCI: Setting latency timer of device 0000:89:00.0 to 64

Nov 19 17:04:27 giselle kernel: [ 1088.331193] PCI: Setting latency timer of device 0000:c7:00.0 to 64

Nov 19 17:04:27 giselle kernel: [ 1088.331348] PCI: Setting latency timer of device 0000:c9:00.0 to 64

Nov 19 17:04:27 giselle kernel: [ 1088.331502] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 177.70.11 Tue Sep 9 16:26:11 PDT 2008

Nov 19 17:05:23 giselle kernel: [ 1144.096083] Uhhuh. NMI received for unknown reason b1.

Nov 19 17:05:23 giselle kernel: [ 1144.096233] You have some hardware problem, likely on the PCI bus.

Nov 19 17:05:23 giselle kernel: [ 1144.096319] Dazed and confused, but trying to continue

Please see attached nvidia-bug-report.log. I’m not sure whether it is a hardware failure (Host,Bridge,Device??) or OS/Driver problem.

Best regards,

Nico.
nvidia_bug_report.zip (38.5 KB)

Does this problem persist with a newer driver?
http://www.nvidia.com/object/linux_display…_177.70.18.html
http://www.nvidia.com/object/linux_display_amd64_180.06.html

Also, can you post the output from running ‘lspci’ ?

thanks

Thanks for your immediate answer.

Both drivers provide same results… no devices in the device list. I also tried Ubuntu 8.10 yesterday…same behaviour.
Attached, you can find logs and lspci out.

Do you think it’s hardware or software based? Which OS would you recommend for testing the S1070?

Did anybody tried to run a TESLA S1070 with a HP Proliant 785 G5?

Best regards,
Nico.
TESLAS1070_lspci_out.zip (4.36 KB)

This looks like a potentially known issue with the Broadcom HT2xxx chipset. The nvidia driver should have a workaround in a future driver release.

Is this HT2100 on host side or on TESLA side? When do you expect first releases of the driver?

I have to correct my messages above…

With the new Driver 180.06, TESLA S1070 is running on HP Proliant DL 785 with Broadcom HT2100 Chipset.

However, I have latency on the PCI-E Bus. Attached, you can find a bandwidthTest. Device to host speed is slow. Results are the same for all devices. Each GPU itself runs very fast.

After reboot, the devices are not in /dev. I use the script provided with cuda 1.1 (see attached getdevices.sh). After that, the system is cuda-ready.

Could you inform me about driver development? Is this bandwidth a known issue?

Best regards,
Nico.
bandwidthTEST__getdevices.zip (745 Bytes)

The problem is with the Broadcom HT2xxx chipset itself, not the Tesla hardware. The next released driver (after 180.08 or 177.70.18) will include the workaround. The timeframe for that release has not yet been finalized, however it may be before the end of the year.

This is good news. Thanks for the information.

Nico.