Tesla S2050 GPU - Unable to start X server and unable to control the FAN noise

We have S 2050 containing 4 GPUs connected to a Quad-core Dell R610 rack server.

After lots of efforts, I had successfully installed CUDA-5.5 successfully

nvidia-smi gives following

Mon Nov 11 12:04:50 2013
±-----------------------------------------------------+
| NVIDIA-SMI 5.319.37 Driver Version: 319.37 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla S2050 On | 0000:0A:00.0 Off | 0 |
| N/A 56C P1 N/A / N/A | 6MB / 2687MB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla S2050 On | 0000:0B:00.0 Off | 0 |
| N/A 56C P1 N/A / N/A | 6MB / 2687MB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla S2050 On | 0000:13:00.0 Off | 0 |
| N/A 56C P1 N/A / N/A | 6MB / 2687MB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla S2050 On | 0000:14:00.0 Off | 0 |
| N/A 56C P1 N/A / N/A | 6MB / 2687MB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| No running compute processes found |
±----------------------------------------------------------------------------+

but i am unable to start the X server with init 5 or startx after nvidia-xconfig whatever i try

did all updates through yum also… no use.

Also this GPU is giving a unbearable noise even just without any GPU process

Can u please help how to control Fan speed or something, to control the noise!!!

X server needs a monitor (or a dummy load) to be able to start. You’d need to post what errors your xorg.log contains so that we may help you further.

Try a simple:

as root and try starting X again… if not you’ll have to see why X is not starting how I mentioned earlier.

Also those temps seen a bit high if the GPUs are at idle… I believe these are passive GPUs. Perhaps you need better airflow/colder AC in your server room

Actually after restarting the temperature is only 46C at idle state but still the noise is too high.

[root@teslanode0 ~]# nvidia-smi

Wed Nov 13 10:07:07 2013
±-----------------------------------------------------+
| NVIDIA-SMI 5.319.37 Driver Version: 319.37 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla S2050 Off | 0000:0A:00.0 Off | 0 |
| N/A 46C P0 N/A / N/A | 6MB / 2687MB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla S2050 Off | 0000:0B:00.0 Off | 0 |
| N/A 46C P0 N/A / N/A | 6MB / 2687MB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla S2050 Off | 0000:13:00.0 Off | 0 |
| N/A 46C P0 N/A / N/A | 6MB / 2687MB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla S2050 Off | 0000:14:00.0 Off | 0 |
| N/A 46C P0 N/A / N/A | 6MB / 2687MB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| No running compute processes found |
±----------------------------------------------------------------------------+

And the Xorg error is

[b][root@teslanode0 ~]# nvidia-xconfig 
[/b]

WARNING: Unable to locate/open X configuration file.

New X configuration file written to ‘/etc/X11/xorg.conf’

[root@teslanode0 ~]# cat /etc/X11/xorg.conf

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 319.37 (buildmeister@swio-display-x64-rhel04-11) Wed Jul 3 18:14:07 PDT 2013

Section “ServerLayout”
Identifier “Layout0”
Screen 0 “Screen0”
InputDevice “Keyboard0” “CoreKeyboard”
InputDevice “Mouse0” “CorePointer”
EndSection

Section “Files”
FontPath “/usr/share/fonts/default/Type1”
EndSection

Section “InputDevice”
# generated from default
Identifier “Mouse0”
Driver “mouse”
Option “Protocol” “auto”
Option “Device” “/dev/input/mice”
Option “Emulate3Buttons” “no”
Option “ZAxisMapping” “4 5”
EndSection

Section “InputDevice”
# generated from data in “/etc/sysconfig/keyboard”
Identifier “Keyboard0”
Driver “kbd”
Option “XkbLayout” “us_intl”
Option “XkbModel” “pc105”
EndSection

Section “Monitor”
Identifier “Monitor0”
VendorName “Unknown”
ModelName “Unknown”
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option “DPMS”
EndSection

Section “Device”
Identifier “Device0”
Driver “nvidia”
VendorName “NVIDIA Corporation”
EndSection

Section “Screen”
Identifier “Screen0”
Device “Device0”
Monitor “Monitor0”
DefaultDepth 24
SubSection “Display”
Depth 24
EndSubSection
EndSection

<b>[root@teslanode0 ~]# startx</b>

xauth: creating new authority file /root/.serverauth.4324

X.Org X Server 1.13.0
Release Date: 2012-09-05
X Protocol Version 11, Revision 0
Build Operating System: c6b8 2.6.32-220.el6.x86_64
Current Operating System: Linux teslanode0.cbt.au 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 16 18:37:12 UTC 2013 x86_64
Kernel command line: ro root=UUID=9a5dc365-df5b-4c67-bd21-61a40390a530 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us-acentos crashkernel=512M rhgb quiet rdblacklist=nouveau
Build Date: 15 October 2013 06:54:00PM
Build ID: xorg-x11-server 1.13.0-11.1.el6.centos.2
Current version of pixman: 0.26.2
Before reporting problems, check https://www.redhat.com/apps/support/
to make sure that you have the latest version.
Markers: (–) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: “/var/log/Xorg.0.log”, Time: Wed Nov 13 10:11:54 2013
(==) Using config file: “/etc/X11/xorg.conf”
Initializing built-in extension Generic Event Extension
Initializing built-in extension SHAPE
Initializing built-in extension MIT-SHM
Initializing built-in extension XInputExtension
Initializing built-in extension XTEST
Initializing built-in extension BIG-REQUESTS
Initializing built-in extension SYNC
Initializing built-in extension XKEYBOARD
Initializing built-in extension XC-MISC
Initializing built-in extension XINERAMA
Initializing built-in extension XFIXES
Initializing built-in extension RENDER
Initializing built-in extension RANDR
Initializing built-in extension COMPOSITE
Initializing built-in extension DAMAGE
Initializing built-in extension MIT-SCREEN-SAVER
Initializing built-in extension DOUBLE-BUFFER
Initializing built-in extension RECORD
Initializing built-in extension DPMS
Initializing built-in extension X-Resource
Initializing built-in extension XVideo
Initializing built-in extension XVideo-MotionCompensation
Initializing built-in extension SELinux
Initializing built-in extension XFree86-VidModeExtension
Initializing built-in extension XFree86-DGA
Initializing built-in extension XFree86-DRI
Initializing built-in extension DRI2
Loading extension GLX

Fatal server error:
no screens found
(EE)
Please consult the CentOS support
at https://www.redhat.com/apps/support/
for help.
(EE) Please also check the log file at “/var/log/Xorg.0.log” for additional information.
(EE)
Server terminated with error (1). Closing log file.
giving up.
xinit: Connection timed out (errno 110): unable to connect to X server
xinit: No such process (errno 3): Server error.

<b>[root@teslanode0 ~]# cat /var/log/Xorg.0.log</b>

[ 589.337]
X.Org X Server 1.13.0
Release Date: 2012-09-05
[ 589.337] X Protocol Version 11, Revision 0
[ 589.337] Build Operating System: c6b8 2.6.32-220.el6.x86_64
[ 589.337] Current Operating System: Linux teslanode0.cbt.au 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 16 18:37:12 UTC 2013 x86_64
[ 589.337] Kernel command line: ro root=UUID=9a5dc365-df5b-4c67-bd21-61a40390a530 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us-acentos crashkernel=512M rhgb quiet rdblacklist=nouveau
[ 589.337] Build Date: 15 October 2013 06:54:00PM
[ 589.337] Build ID: xorg-x11-server 1.13.0-11.1.el6.centos.2
[ 589.337] Current version of pixman: 0.26.2
[ 589.337] Before reporting problems, check https://www.redhat.com/apps/support/
to make sure that you have the latest version.
[ 589.337] Markers: (–) probed, () from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[ 589.337] (==) Log file: “/var/log/Xorg.0.log”, Time: Wed Nov 13 10:11:54 2013
[ 589.337] (==) Using config file: “/etc/X11/xorg.conf”
[ 589.337] (==) ServerLayout “Layout0”
[ 589.337] (
) |–>Screen “Screen0” (0)
[ 589.337] () | |–>Monitor “Monitor0”
[ 589.337] (
) | |–>Device “Device0”
[ 589.337] () |–>Input Device “Keyboard0”
[ 589.337] (
) |–>Input Device “Mouse0”
[ 589.337] (==) Automatically adding devices
[ 589.337] (==) Automatically enabling devices
[ 589.337] (==) Not automatically adding GPU devices
[ 589.337] (**) FontPath set to:
/usr/share/fonts/default/Type1,
catalogue:/etc/X11/fontpath.d,
built-ins
[ 589.337] (==) ModulePath set to “/usr/lib64/xorg/modules”
[ 589.337] (WW) Hotplugging is on, devices using drivers ‘kbd’, ‘mouse’ or ‘vmmouse’ will be disabled.
[ 589.337] (WW) Disabling Keyboard0
[ 589.337] (WW) Disabling Mouse0
[ 589.337] (II) Loader magic: 0x810020
[ 589.337] (II) Module ABI versions:
[ 589.337] X.Org ANSI C Emulation: 0.4
[ 589.337] X.Org Video Driver: 13.1
[ 589.337] X.Org XInput driver : 18.1
[ 589.337] X.Org Server Extension : 7.0
[ 589.339] (–) PCI: (0:10:0:0) 10de:06de:10de:0773 rev 163, Mem @ 0xd4000000/33554432, 0xac000000/67108864, 0xb0000000/67108864, I/O @ 0x0000ec80/128, BIOS @ 0x???/524288
[ 589.339] (–) PCI: (0:11:0:0) 10de:06de:10de:0773 rev 163, Mem @ 0xd0000000/33554432, 0xa4000000/67108864, 0xa8000000/67108864, I/O @ 0x0000dc80/128, BIOS @ 0x???/524288
[ 589.339] (–) PCI: (0:19:0:0) 10de:06de:10de:0773 rev 163, Mem @ 0xdc000000/33554432, 0xbc000000/67108864, 0xc0000000/67108864, I/O @ 0x0000cc80/128, BIOS @ 0x???/524288
[ 589.339] (–) PCI: (0:20:0:0) 10de:06de:10de:0773 rev 163, Mem @ 0xd8000000/33554432, 0xb4000000/67108864, 0xb8000000/67108864, I/O @ 0x0000bc80/128, BIOS @ 0x???/524288
[ 589.339] (–) PCI:*(0:22:3:0) 102b:0532:1028:0236 rev 10, Mem @ 0xc4000000/8388608, 0xde7fc000/16384, 0xde800000/8388608, BIOS @ 0x???/65536
[ 589.339] Initializing built-in extension Generic Event Extension
[ 589.339] Initializing built-in extension SHAPE
[ 589.339] Initializing built-in extension MIT-SHM
[ 589.339] Initializing built-in extension XInputExtension
[ 589.339] Initializing built-in extension XTEST
[ 589.339] Initializing built-in extension BIG-REQUESTS
[ 589.339] Initializing built-in extension SYNC
[ 589.339] Initializing built-in extension XKEYBOARD
[ 589.339] Initializing built-in extension XC-MISC
[ 589.339] Initializing built-in extension XINERAMA
[ 589.339] Initializing built-in extension XFIXES
[ 589.339] Initializing built-in extension RENDER
[ 589.339] Initializing built-in extension RANDR
[ 589.339] Initializing built-in extension COMPOSITE
[ 589.339] Initializing built-in extension DAMAGE
[ 589.339] Initializing built-in extension MIT-SCREEN-SAVER
[ 589.339] Initializing built-in extension DOUBLE-BUFFER
[ 589.339] Initializing built-in extension RECORD
[ 589.339] Initializing built-in extension DPMS
[ 589.339] Initializing built-in extension X-Resource
[ 589.339] Initializing built-in extension XVideo
[ 589.339] Initializing built-in extension XVideo-MotionCompensation
[ 589.339] Initializing built-in extension SELinux
[ 589.339] Initializing built-in extension XFree86-VidModeExtension
[ 589.339] Initializing built-in extension XFree86-DGA
[ 589.339] Initializing built-in extension XFree86-DRI
[ 589.339] Initializing built-in extension DRI2
[ 589.339] (II) LoadModule: “glx”
[ 589.339] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[ 589.342] (II) Module glx: vendor=“NVIDIA Corporation”
[ 589.342] compiled for 4.0.2, module version = 1.0.0
[ 589.342] Module class: X.Org Server Extension
[ 589.342] (II) NVIDIA GLX Module 319.37 Wed Jul 3 17:20:06 PDT 2013
[ 589.342] Loading extension GLX
[ 589.342] (II) LoadModule: “nvidia”
[ 589.342] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so
[ 589.343] (II) Module nvidia: vendor=“NVIDIA Corporation”
[ 589.343] compiled for 4.0.2, module version = 1.0.0
[ 589.343] Module class: X.Org Video Driver
[ 589.343] (II) NVIDIA dlloader X Driver 319.37 Wed Jul 3 16:58:33 PDT 2013
[ 589.343] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[ 589.343] (–) using VT number 7

[ 589.346] (EE) No devices detected.
[ 589.346]
Fatal server error:
[ 589.346] no screens found
[ 589.346] (EE)
Please consult the CentOS support
at https://www.redhat.com/apps/support/
for help.
[ 589.346] (EE) Please also check the log file at “/var/log/Xorg.0.log” for additional information.
[ 589.346] (EE)
[root@teslanode0 ~]#

It’s better to provide log about “lspci”. Did you connect S2070 with NVIDIA HIC or GHIC? And what graphical are you using for display out?

[root@teslanode0 ~]# lspci

00:00.0 Host bridge: Intel Corporation 5500 I/O Hub to ESI Port (rev 13)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 13)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 13)
00:09.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 13)
00:14.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub System Management Registers (rev 13)
00:14.1 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 13)
00:14.2 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 13)
00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801IB (ICH9) 2 port SATA Controller [IDE mode] (rev 02)
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
03:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)
04:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
05:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
05:01.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
05:02.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
05:03.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
08:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
09:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
09:02.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
0a:00.0 3D controller: NVIDIA Corporation GF100 [Tesla S2050] (rev a3)
0a:00.1 Audio device: NVIDIA Corporation GF100 High Definition Audio Controller (rev a1)
0b:00.0 3D controller: NVIDIA Corporation GF100 [Tesla S2050] (rev a3)
0b:00.1 Audio device: NVIDIA Corporation GF100 High Definition Audio Controller (rev a1)
0d:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
0e:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
0e:01.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
0e:02.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
0e:03.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
11:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
12:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
12:02.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
13:00.0 3D controller: NVIDIA Corporation GF100 [Tesla S2050] (rev a3)
13:00.1 Audio device: NVIDIA Corporation GF100 High Definition Audio Controller (rev a1)
14:00.0 3D controller: NVIDIA Corporation GF100 [Tesla S2050] (rev a3)
14:00.1 Audio device: NVIDIA Corporation GF100 High Definition Audio Controller (rev a1)
16:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
[root@teslanode0 ~]#