Cannot find Tesla cards NVIDIA: could not open the device file /dev/nvidia2

Hi,

We have a Tesla s870 1u box plugged into a redhat server. Previously this was connected to just two of the cards, but after a reshuffle it now is connected to all four cards. However it has since started giving the error

[codebox]

./deviceQuery

NVIDIA: could not open the device file /dev/nvidia2 (No such file or directory).

There is no device supporting CUDA.

[/codebox]

I have checked the cables, and tried reinstalling the drivers just in case. lspci gives the following output

[codebox]

/sbin/lspci

00:00.0 Host bridge: Intel Corporation 5400 Chipset Memory Controller Hub (rev 20)

00:01.0 PCI bridge: Intel Corporation 5400 Chipset PCI Express Port 1 (rev 20)

00:05.0 PCI bridge: Intel Corporation 5400 Chipset PCI Express Port 5 (rev 20)

00:09.0 PCI bridge: Intel Corporation 5400 Chipset PCI Express Port 9 (rev 20)

00:10.0 Host bridge: Intel Corporation 5400 Chipset FSB Registers (rev 20)

00:10.1 Host bridge: Intel Corporation 5400 Chipset FSB Registers (rev 20)

00:10.2 Host bridge: Intel Corporation 5400 Chipset FSB Registers (rev 20)

00:10.3 Host bridge: Intel Corporation 5400 Chipset FSB Registers (rev 20)

00:10.4 Host bridge: Intel Corporation 5400 Chipset FSB Registers (rev 20)

00:11.0 Host bridge: Intel Corporation 5400 Chipset CE/SF Registers (rev 20)

00:15.0 Host bridge: Intel Corporation 5400 Chipset FBD Registers (rev 20)

00:15.1 Host bridge: Intel Corporation 5400 Chipset FBD Registers (rev 20)

00:16.0 Host bridge: Intel Corporation 5400 Chipset FBD Registers (rev 20)

00:16.1 Host bridge: Intel Corporation 5400 Chipset FBD Registers (rev 20)

00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)

00:1c.1 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 2 (rev 09)

00:1c.2 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 3 (rev 09)

00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09)

00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09)

00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09)

00:1d.3 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #4 (rev 09)

00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09)

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)

00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09)

00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 09)

00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset SATA IDE Controller (rev 09)

03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722 Gigabit Ethernet PCI Express

04:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722 Gigabit Ethernet PCI Express

05:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01)

05:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01)

06:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01)

09:00.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

0a:00.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

0a:01.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

0a:02.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

0a:03.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

0c:00.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

0d:00.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

0d:01.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

0d:02.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

0d:03.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

0f:00.0 3D controller: nVidia Corporation Tesla S870 (Compute Server Component) (rev a2)

11:00.0 3D controller: nVidia Corporation Tesla S870 (Compute Server Component) (rev a2)

14:00.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

15:00.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

15:01.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

15:02.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

15:03.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

17:00.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

18:00.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

18:01.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

18:02.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

18:03.0 PCI bridge: nVidia Corporation Tesla S870 (rev a2)

1a:00.0 3D controller: nVidia Corporation Tesla S870 (Compute Server Component) (rev a2)

1c:00.0 3D controller: nVidia Corporation Tesla S870 (Compute Server Component) (rev a2)

[/codebox]

if anyone could offer any further thoughts on this I would appreciate it I have tried searching the forum, but get an error telling me that flood control is in operation, so if it has been discussed on an earlier thread please just pass me on to that.

Daniel

Ok, turned out that the initial installation was using a script to setup the cards as it was not possible to start x, and this script contained a constant for the number of cards. Doh!