2 Tesla C1060s with a legacy GeForce FX 5200 card Need help editing the xorg.conf file for multiple

netllama · January 27, 2009, 2:53pm

Ok… to make it easier, I’ve switched to a GeForce 6200 and used the default xorg.conf. But, the driver craps out as follows:

codebox Setting vga for screen 0.

(**) NVIDIA(0): Depth 24, (–) framebuffer bpp 32

(==) NVIDIA(0): RGB weight 888

(==) NVIDIA(0): Default visual is TrueColor

(==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)

(**) NVIDIA(0): Enabling RENDER acceleration

(II) NVIDIA(0): Support for GLX with the Damage and Composite X extensions is

(II) NVIDIA(0): enabled.

(EE) NVIDIA(0): Failed to initialize the NVIDIA graphics device!

(II) UnloadModule: “nvidia”

(II) UnloadModule: “wfb”

(II) UnloadModule: “fb”

(EE) Screen(s) found, but none have a usable configuration.

Fatal server error:

no screens found[/codebox]

There are also these kernel errors:

[codebox] /var/log/messages:

Jan 26 18:15:58 gpu2 kernel: [ 333.639010] NVRM: request_mem_region failed for 16M @ 0xfb000000. This can

Jan 26 18:15:58 gpu2 kernel: [ 333.639011] NVRM: occur when a driver such as rivatv is loaded and claims

Jan 26 18:15:58 gpu2 kernel: [ 333.639012] NVRM: ownership of the device’s registers.

Jan 26 18:15:58 gpu2 kernel: [ 333.639026] NVRM: The NVIDIA probe routine failed for 1 device(s).

Jan 26 18:15:58 gpu2 kernel: [ 333.639028] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 180.06 Sat Nov 8 17:50:38 PST 2008

Jan 26 18:16:43 gpu2 kernel: [ 378.827720] NVRM: request_mem_region failed for 16M @ 0xfb000000. This can

Jan 26 18:16:43 gpu2 kernel: [ 378.827721] NVRM: occur when a driver such as rivatv is loaded and claims

Jan 26 18:16:43 gpu2 kernel: [ 378.827722] NVRM: ownership of the device’s registers.

Jan 26 18:16:43 gpu2 kernel: [ 378.827737] NVRM: The NVIDIA probe routine failed for 1 device(s).

Jan 26 18:16:43 gpu2 kernel: [ 378.827739] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 180.06 Sat Nov 8 17:50:38 PST 2008

Jan 26 18:18:34 gpu2 kernel: [ 44.812830] NVRM: request_mem_region failed for 16M @ 0xfb000000. This can

Jan 26 18:18:34 gpu2 kernel: [ 44.812830] NVRM: occur when a driver such as rivatv is loaded and claims

Jan 26 18:18:34 gpu2 kernel: [ 44.812831] NVRM: ownership of the device’s registers.

Jan 26 18:18:34 gpu2 kernel: [ 44.812844] NVRM: The NVIDIA probe routine failed for 1 device(s).

Jan 26 18:18:34 gpu2 kernel: [ 44.812845] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 180.06 Sat Nov 8 17:50:38 PST 2008

[/codebox]

I’ve been googling these errors for a while now, with no luck. Any kind of insight would be appreciated. If you can provide a workable xorg.conf, that would be awesome! Attached is the full bug report.

This isn’t an xorg.conf problem. Its an issue with your system. This looks like an SBIOS bug. Verifying that you have the latest SBIOS would be a good idea.

I should also note that 180.06 no longer supported, and Ubuntu-8.10 isn’t supported at all right now with CUDA. Testing with the latest released driver would be a good idea.

ckhw2 · January 28, 2009, 3:34am

Do you mean system BIOS? I have the latest system BIOS. But it could still be that. It’s a finicky X58-based motherboard from MSI, that was released too soon. They should had tested it with standard PCI graphics cards. I had earlier problems with RAM and clock speed.

Currently, I’m trying to start ubuntu without starting X (created a custom initlevel3 by disabling gdm) and I’ll modprobe the nvidia devices to get CUDA working on them. I’ll post on how that goes, and hopefully we can close this thread.

netllama · January 28, 2009, 2:42pm

Yes, SBIOS = system BIOS. I’d be pleasantly surprised if your workaround for not starting X helped. Your motherboard doesn’t seem capable of accessing all the GPUs correctly from a low level.

ckhw2 · January 29, 2009, 12:37am

That’s a scary thought. But, I have gotten this far since yesterday:

Installed the latest 180.22 drivers for cuda 2.1 and also the toolkit and sdk; ran â€œmakeâ€ on the SDK to get the binaries

Disabled gdm in run level 3 (using sysv-rc-conf) and made it the default run level by editing /etc/inittab

Replaced the existing xorg.conf with the custom-made version attached

Edited the /etc/rc.local script to call the attached cuda.sh script (that I found online) to “modprobe nvidia” and to add the /dev/nvidia* entries

On trying to run any CUDA app, it craps out and shows a version conflict between the kernel module and the driver, which I was hoping someone might know how to fix.

[codebox]cyriac@gpu2:~$ ./NVIDIA_CUDA_SDK/bin/linux/release/deviceQuery

Error: API mismatch: the NVIDIA kernel module has version 96.43.05,

but this NVIDIA driver component has version 180.22. Please make

sure that the kernel module and all NVIDIA driver components

have the same version.

cudaSafeCall() Runtime API error in file <deviceQuery.cu>, line 59 : initialization error.

cyriac@gpu2:~$[/codebox]

It also seems odd that there is only one nvidia entry in /dev, when in fact, lspci clearly shows the two Tesla C1060 cards. So shouldnâ€™t there be an â€œnvidia1â€ /dev entry too?

[codebox]cyriac@gpu2:~$ ls /dev/nvidia*

/dev/nvidia0 /dev/nvidiactl

cyriac@gpu2:~$ lspci

…

02:00.0 3D controller: nVidia Corporation Unknown device 05e7 (rev a1)

03:00.0 3D controller: nVidia Corporation Unknown device 05e7 (rev a1)

…

0a:00.0 VGA compatible controller: ATI Technologies Inc RV 610LE PCI [Radeon HD 2400]

0a:00.1 Audio device: ATI Technologies Inc RV610 audio device [Radeon HD 2400 PRO][/codebox]

At this point, I dunno how to proceed. If you are familiar with the problems above, please do let me know.

Edit: Also note that I’ve switched back to the Radeon HD 2400. This one doesn’t appear to have the SBIOS issues that netllama mentioned, when I used a GeForce 6200.

Thanks,

Cyriac
cuda.sh.txt (1.67 KB)
xorg.conf.txt (2.09 KB)

netllama · January 29, 2009, 12:40am

Ubuntu ships 96.43.05. They has a ‘feature’ which reinstalls it & reconfigures X upon rebooting.

ckhw2 · January 29, 2009, 4:29pm

So how do I get around this ‘feature’. Does it involve recompiling the kernel?

Edit: I’ll try and follow these instructions and post on how that goes.

ckhw2 · January 29, 2009, 8:31pm

Ok. I managed to replace the driver modules shipped with Ubuntu with the current modules. But now I get this error:

[codebox]cyriac@gpu2:~$ ./NVIDIA_CUDA_SDK/bin/linux/release/deviceQuery

NVIDIA: could not open the device file /dev/nvidia1 (No such file or directory).

cudaSafeCall() Runtime API error in file <deviceQuery.cu>, line 59 : initialization error.

cyriac@gpu2:~$

[/codebox]

Does anyone know why the following script that is called from /etc/rc.local does not create a /dev/nvidia1 entry for the second Tesla?

[codebox]

modprobe nvidia

Count the number of NVIDIA controllers found.

N3D=/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l

NVGA=/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l

Make /dev entries for each nvidia card in the system

N=expr $N3D + $NVGA - 1

for i in seq 0 $N; do

   mknod -m 666 /dev/nvidia$i c 195 $i

done

mknod -m 666 /dev/nvidiactl c 195 255

[/codebox]

netllama · January 29, 2009, 8:39pm

Ok. I managed to replace the driver modules shipped with Ubuntu with the current modules. But now I get this error:

[codebox]cyriac@gpu2:~$ ./NVIDIA_CUDA_SDK/bin/linux/release/deviceQuery

NVIDIA: could not open the device file /dev/nvidia1 (No such file or directory).

cudaSafeCall() Runtime API error in file <deviceQuery.cu>, line 59 : initialization error.

cyriac@gpu2:~$

[/codebox]

Does anyone know why the following script that is called from /etc/rc.local does not create a /dev/nvidia1 entry for the second Tesla?

[codebox]

modprobe nvidia

Count the number of NVIDIA controllers found.

N3D=/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l

NVGA=/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l

Make /dev entries for each nvidia card in the system

N=expr $N3D + $NVGA - 1

for i in seq 0 $N; do
   mknod -m 666 /dev/nvidia$i c 195 $i
done

mknod -m 666 /dev/nvidiactl c 195 255

[/codebox]

Was /dev/nvidia1 created? If not, then you need to create it.

ckhw2 · January 29, 2009, 9:28pm

Thanks! I needed to debug my mod-probing script. lspci was at /usr/bin/lspci and not at /sbin/lspci. So it now adds both /dev entries correctly and I have CUDA apps running on the two Teslas (without starting X), while the Radeon 2400 is used for display. Surprisingly, I was able to ‘startx’ which properly loaded the ‘radeon’ drivers for the HD2400 and the ‘nvidia’ drivers for the Teslas. So awesome! :D

Maybe next I’ll try to load the fglrx drivers for the HD2400 in X. It might even be possible to run GL apps on the Teslas and extract and display images from its frame buffer. Yikes! But that’s all for another thread another time.

Thanks for all your help.

Topic		Replies	Views
xorg.conf file problem CUDA Programming and Performance	9	52563	August 24, 2009
Tesla card on Lucid Lynx - no CUDA-capable device is detected CUDA Programming and Performance	18	19974	February 2, 2011
tesla c1060 on ubuntu 8.04 problem with 180.60 driver CUDA Programming and Performance	4	4491	December 5, 2008
Getting all GPUs to work CUDA Programming and Performance	12	13749	September 3, 2010
Tesla C870 and Linux RHEL 4.5 CUDA Programming and Performance	13	29004	February 28, 2008
Tesla C1060 on asus P5ld2 "There is no device supporting cuda" CUDA Programming and Performance	6	11316	October 8, 2009
Cuda error on xp 64 Tescla C1060 (with GF 7900 GS) CUDA Programming and Performance	22	23550	March 29, 2010
(1) Tesla K10 & (1) Tesla K20, CentOS 6.4 - <nvidia-settings> not displaying cards Linux	10	4580	September 9, 2013
Tesla S1070 under RH 5.3 S1070 not detected correctly by T5500 CUDA Programming and Performance	3	3126	July 6, 2009
NVIDIA-settings shows strange information about myTesla C1060 CUDA Programming and Performance	4	15201	December 21, 2009

2 Tesla C1060s with a legacy GeForce FX 5200 card Need help editing the xorg.conf file for multiple

Count the number of NVIDIA controllers found.

Make /dev entries for each nvidia card in the system

Related topics