Problem setting up S870

Hi all,

Sorry for the cross post – in retrospect I probably should have posted here instead of the General Discussion board (http://forums.nvidia.com/index.php?showtopic=63942), but hindsight is 20-20. Anyway, I am having problems setting up my S870.

I am trying to set up a Tesla S870 with an HP DL 160 G5 as the host box. I ran the 171-05-pkg2 installer and it ran ok. I ran nvidia-xconfig at the end and that updated my xorg.conf file, but when I tried to restart X, of course it didn’t work, because the DL 160 comes with Matrox onboard video, which is what my monitor is plugged into. I switch the “Driver” in xorg.conf back to mga (from “nvidia”), and the video comes up, but when I run nvidia-settings, it says I’m not using the nvidia driver and to go back and fix X.

How do I load everything I need to use the S870 while still maintaining the ability to use my onboard video out? I’d like to use the onboard video to plug a display into for when we do some demos of what we’re working on.

Info:
Tesla: S870 (with both host adapters plugged into the two 16x PCI-E slots in my host)
Host: HP DL 160 G5
OS: CentOS 5.1 (RHEL)
CPU: Intel Xeon (x86_64)
RAM: 8GB

Thanks,
–Joe

Please generate and attach an nvidia-bug-report.log which captures the failing configuration.

Attached…[attachment=6080:attachment]
nvidia_bug_report.log.txt (197 KB)

This is what we do on our clusters (the nodes have on-board video enabled with S870 attached, like in your configuration):

  1. Get X to work with your on-board adapter.

  2. Install the nvidia driver skipping the Xconfiguration step at the end

  3. use this script to load the driver for the Tesla cards:

#!/bin/bash

modprobe nvidia

if [ “$?” -eq 0 ]; then

Count the number of NVIDIA controllers found.

N3D=/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l
NVGA=/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l

N=expr $N3D + $NVGA - 1
for i in seq 0 $N; do
mknod -m 666 /dev/nvidia$i c 195 $i;
done

mknod -m 666 /dev/nvidiactl c 195 255

else
exit 1
fi

This was something else I was curious about. In lspci, everything nVidia related shows up as “nVidia Corporation Unknown device.” I see four “3D controllers,” but no “VGA compatible controllers.” Other than the four 3D controllers, I have 20 entries of “PCI bridge: nVidia Corporation Unknown device 05be (rev a2).”

lsmod shows that the nvidia driver is loaded. Is this unknown device explanation expected?

The C870s show up as 3D controllers, so you are seeing all the 4 GPUs in the S870.

The unknown device is expected, it takes a while for new PCI-id to show up in /usr/share/hwdata/pci-ids

Gotcha. Ok, I’ll try the script to set up the /dev devices and let one of our developers loose to see what he can do.

Thanks for the assistance!

It worked, and we appear to have a happily working Tesla. Thanks for the help.

–Joe