Install Problem

Hello,

I’m pretty sure I’m doing something dumb because I’ve only had a GPU computer for 2 days. I’ve installed the driver, Tollkit and SDK under RHEL5.4. I found one problem with -lglut solved on this forum, but next error doesn’t seem to fit. When I run make in NVIDIA_GPU_Computing_SDK]/C I get the error:

make[1]: Entering directory `/home/mrosing/NVIDIA_GPU_Computing_SDK/C/src/simpleTextureDrv’
/usr/bin/ld: cannot find -lcuda

A list of $LD_LIBRARY_PATH gives:

$ ls -l $LD_LIBRARY_PATH
total 38436
lrwxrwxrwx 1 root root 17 Dec 7 08:58 libcublasemu.so -> libcublasemu.so.3
lrwxrwxrwx 1 root root 21 Dec 7 08:58 libcublasemu.so.3 -> libcublasemu.so.3.0.8
-rwxr-xr-x 1 root root 8120440 Dec 7 08:58 libcublasemu.so.3.0.8
lrwxrwxrwx 1 root root 14 Dec 7 08:58 libcublas.so -> libcublas.so.3
lrwxrwxrwx 1 root root 18 Dec 7 08:58 libcublas.so.3 -> libcublas.so.3.0.8
-rwxr-xr-x 1 root root 21622336 Dec 7 08:58 libcublas.so.3.0.8
lrwxrwxrwx 1 root root 17 Dec 7 08:58 libcudartemu.so -> libcudartemu.so.3
lrwxrwxrwx 1 root root 21 Dec 7 08:58 libcudartemu.so.3 -> libcudartemu.so.3.0.8
-rwxr-xr-x 1 root root 246600 Dec 7 08:58 libcudartemu.so.3.0.8
lrwxrwxrwx 1 root root 14 Dec 7 08:58 libcudart.so -> libcudart.so.3
lrwxrwxrwx 1 root root 18 Dec 7 08:58 libcudart.so.3 -> libcudart.so.3.0.8
-rwxr-xr-x 1 root root 255456 Dec 7 08:58 libcudart.so.3.0.8
lrwxrwxrwx 1 root root 16 Dec 7 08:58 libcufftemu.so -> libcufftemu.so.3
lrwxrwxrwx 1 root root 20 Dec 7 08:58 libcufftemu.so.3 -> libcufftemu.so.3.0.8
-rwxr-xr-x 1 root root 1676224 Dec 7 08:58 libcufftemu.so.3.0.8
lrwxrwxrwx 1 root root 13 Dec 7 08:58 libcufft.so -> libcufft.so.3
lrwxrwxrwx 1 root root 17 Dec 7 08:58 libcufft.so.3 -> libcufft.so.3.0.8
-rwxr-xr-x 1 root root 7284272 Dec 7 08:58 libcufft.so.3.0.8

libcuda isn’t there. What did I do wrong?
Thanks,
Mike

libcuda is supplied as part of the driver package, and should be in /usr/lib. If it isn’t, it is a sign that your driver installation didn’t work, or that you used a third party driver repackage that doesn’t include libcuda (or puts it in a non standard place).

Thanks, I will double check the driver install. Once I reboot, how do I get NVIDIA X server Settings to show up? Is that under “Applications” or “System”? The fact that I don’t see it is a clue that the driver did not install.

Try running nvidia-settings from a terminal.

You should also add /usr/local/cuda/lib to your LD_LIBRARY_PATH

N.

I definitely did not install the Nvidia drivers. When I did get the .run file to execute it told me I can’t install under Xen. So now I have to go back to RedHat, load the sources and rebuild the kernel without Xen.

I do prefer to run from terminal, so once I get things installed correctly I’ll try that.

Thanks!

Well, I’m slowly getting somewhere. I got the Nvidia driver to load after installing the kernel-devel rpm (and telling the .run where to find it). But now when I run nvidia-settings it gives me an error

"You do not appear to be using the NVIDIA X driver. Please edit your X configuration file (just run nvidia-xconfig as root), and restart the X server. "
I ran nvidia-xconfig as root and rebooted, but it gives me the same error. Here is the output from nvidia-xconfig:

Using X configuration file: “/etc/X11/xorg.conf”.

WARNING: Unable to find CorePointer in X configuration; attempting to add new
CorePointer section.

WARNING: The CorePointer device was not specified explicitly in the layout;
using the first mouse device.

Backed up file ‘/etc/X11/xorg.conf’ as ‘/etc/X11/xorg.conf.backup’
New X configuration file written to ‘/etc/X11/xorg.conf’

And here is what the file xorg.conf has:

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 1.0 (buildmeister@builder62) Wed Jul 22 16:45:17 PDT 2009

Xorg configuration created by system-config-display

Section “ServerLayout”
Identifier “single head configuration”
Screen 0 “Screen0” 0 0
InputDevice “Mouse0” “CorePointer”
InputDevice “Keyboard0” “CoreKeyboard”
EndSection

Section “InputDevice”
# generated from default
Identifier “Mouse0”
Driver “mouse”
Option “Protocol” “auto”
Option “Device” “/dev/input/mice”
Option “Emulate3Buttons” “no”
Option “ZAxisMapping” “4 5”
EndSection

Section “InputDevice”
Identifier “Keyboard0”
Driver “kbd”
Option “XkbModel” “pc105”
Option “XkbLayout” “us”
EndSection

Section “Monitor”
Identifier “Monitor0”
VendorName “Unknown”
ModelName “Unknown”
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option “DPMS”
EndSection

Section “Device”
Identifier “Videocard0”
Driver “nvidia”
EndSection

Section “Screen”
Identifier “Screen0”
Device “Videocard0”
Monitor “Monitor0”
DefaultDepth 24
SubSection “Display”
Viewport 0 0
Depth 24
Modes “1024x768” “800x600” “640x480”
EndSubSection
EndSection

This machine has 4 Tesla cards. I don’t understand what the X server has to do with that. How do I get past this error?
Thanks!

You have 4 Tesla cards, but what are you using for a display adapter (given none of the Telsa cards can drive the monitor)?

It looks like it is a Matrox G200e. The driver for that is installed via an rpm already. I’m also running VCN which replicates the external console. That might also confuse things.

Then you definitely should not have accepted the NVIDIA drivers offer to set up X11 for you… You now have an X11 setup for a card you can’t use. Usually the installer drops a backup of the xorg.conf file somewhere. If you revert to that, you should be OK.

OK, but that puts me back where I started - nvidia-settings gives me an error. I have put the backup to xorg.conf and moved the nvidia version to xorg.conf.nvidia. I just ignore the error?

No, you have more work left to do. Because you aren’t running the nvidia X11 drivers, the underlying driver isn’t loaded and the devices need to use CUDA applications won’t exist, so you will have to install a script to do it for you at boot time.

How to do that is discussed in the release notes. You did read those, right?

Not those. I’ve been trying to get through CUDA_SDK_release_notes_linux.txt. I did get all the samples to compile, so it looks like I’m getting a lot closer. But when I run deviceQuery it gives me an error:

./deviceQuery: error while loading shared libraries: libcudart.so.3: cannot open shared object file: No such file or directory

which doesn’t make sense because I’ve got LD_LIBRARY_PATH set correctly and the file is there. I haven’t gotten through the whole “getting started” pdf yet.

If you’re running x64, you need to set LD_LIBRARY_PATH to /usr/local/cuda/lib64

N.

Yup - got that:

$ echo $LD_LIBRARY_PATH
/usr/local/cuda/lib64

$ ls -l /usr/local/cuda/lib64
total 38436
lrwxrwxrwx 1 root root 17 Dec 7 08:58 libcublasemu.so -> libcublasemu.so.3
lrwxrwxrwx 1 root root 21 Dec 7 08:58 libcublasemu.so.3 -> libcublasemu.so.3.0.8
-rwxr-xr-x 1 root root 8120440 Dec 7 08:58 libcublasemu.so.3.0.8
lrwxrwxrwx 1 root root 14 Dec 7 08:58 libcublas.so -> libcublas.so.3
lrwxrwxrwx 1 root root 18 Dec 7 08:58 libcublas.so.3 -> libcublas.so.3.0.8
-rwxr-xr-x 1 root root 21622336 Dec 7 08:58 libcublas.so.3.0.8
lrwxrwxrwx 1 root root 17 Dec 7 08:58 libcudartemu.so -> libcudartemu.so.3
lrwxrwxrwx 1 root root 21 Dec 7 08:58 libcudartemu.so.3 -> libcudartemu.so.3.0.8
-rwxr-xr-x 1 root root 246600 Dec 7 08:58 libcudartemu.so.3.0.8
lrwxrwxrwx 1 root root 14 Dec 7 08:58 libcudart.so -> libcudart.so.3
lrwxrwxrwx 1 root root 18 Dec 7 08:58 libcudart.so.3 -> libcudart.so.3.0.8
-rwxr-xr-x 1 root root 255456 Dec 7 08:58 libcudart.so.3.0.8
lrwxrwxrwx 1 root root 16 Dec 7 08:58 libcufftemu.so -> libcufftemu.so.3
lrwxrwxrwx 1 root root 20 Dec 7 08:58 libcufftemu.so.3 -> libcufftemu.so.3.0.8
-rwxr-xr-x 1 root root 1676224 Dec 7 08:58 libcufftemu.so.3.0.8
lrwxrwxrwx 1 root root 13 Dec 7 08:58 libcufft.so -> libcufft.so.3
lrwxrwxrwx 1 root root 17 Dec 7 08:58 libcufft.so.3 -> libcufft.so.3.0.8
-rwxr-xr-x 1 root root 7284272 Dec 7 08:58 libcufft.so.3.0.8

the system is set up right (sort of).

I found the answer here: my ldconfig was not set up correctly. Adding /usr/local/cuda/lib64 to ld.so.conf and running ldconfig fixed the inability to find the library.

So now I have the problem that there are no devices supporting cuda. so now I have to create them using the script.
Can it only run at boot? And when during boot (i.e. which startup file does it go into?)

I tried this by hand:

cd /dev

mknod -m 666 nvidia0 c 195 0

mknod -m 666 nvidia1 c 195 1

mknod -m 666 nvidia2 c 195 2

mknod -m 666 nvidia3 c 195 3

mknod -m 666 /dev/nvidiactl c 195 255

ls -l nvi*

crw-rw-rw- 1 root root 195, 0 Dec 8 14:48 nvidia0

crw-rw-rw- 1 root root 195, 1 Dec 8 14:48 nvidia1

crw-rw-rw- 1 root root 195, 2 Dec 8 14:49 nvidia2

crw-rw-rw- 1 root root 195, 3 Dec 8 14:49 nvidia3

crw-rw-rw- 1 root root 195, 255 Dec 8 14:51 nvidiactl

but deviceQuery still comes back with no devices. What else do I need to run? And now that these files are created in /dev, is there really a need to create them every time at boot?

modprobe nvidia to load the kernel module

Yes. /dev entries haven’t been persistent in linux for a very, very long time…

Shows how old I am :-) Where should I put the script and how do I get the boot process to execute it? If I just stick it in init.d will it auto run?

Edit: when I do it by hand, lsmod shows nvidia loaded and I have all the files in /dev, but deviceQuery still doesn’t find anything. What is magic about doing it at boot?

OK, here’s where I’m at as superuser:

[root@bouredhat ~]# lsmod | grep nvidia
nvidia 9715432 0
i2c_core 56129 3 i2c_ec,nvidia,i2c_i801
[root@bouredhat ~]# ls -l /dev/nvi*
crw-rw-rw- 1 root root 195, 0 Dec 8 15:43 /dev/nvidia0
crw-rw-rw- 1 root root 195, 1 Dec 8 15:43 /dev/nvidia1
crw-rw-rw- 1 root root 195, 2 Dec 8 15:43 /dev/nvidia2
crw-rw-rw- 1 root root 195, 3 Dec 8 15:43 /dev/nvidia3
crw-rw-rw- 1 root root 195, 255 Dec 8 15:44 /dev/nvidiactl

[root@bouredhat ~]# setenforce 0

[root@bouredhat mrosing]# cd NVIDIA_GPU_Computing/C/bin/linux/release

[root@bouredhat release]# ./deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)
There is no device supporting CUDA

Test PASSED

Press ENTER to exit…

I have the devices set up according to the script, I have the driver loaded, I have all the libraries in the right places, and it still can’t find the cards. What am I missing?

Edit: better formatting

try running

nvidia-smi -lsa

first, and then run device query.