Problems getting CUDA samples running on Red Hat 5.5 (64-bit)

We have a research cluster and just added a new compute node into it that has a GPU so we can start providing GPU capabilities to our users. The server is an IBM iDataPlex running Red Hat linux 5.5 (64-bit). When I run lspci it shows:

[root@node46 ~]# lspci | grep -i nvidia

19:00.0 3D controller: nVidia Corporation GF100 [Tesla S2050] (rev a3)

19:00.1 Audio device: nVidia Corporation GF100 High Definition Audio Controller (rev a1)

1a:00.0 3D controller: nVidia Corporation GF100 [Tesla S2050] (rev a3)

1a:00.1 Audio device: nVidia Corporation GF100 High Definition Audio Controller (rev a1)

So I downloaded and installed NVIDIA-Linux-x86_64-270.41.34.run, cudatoolkit_4.0.17_linux_64_rhel5.5.run, and gpucomputingsdk_4.0.17_linux.run. After installing each of those and adding the appropriate paths to my PATH & LD_LIBRARY_PATH I rebooted the server to ensure everything is kosher. lsmod shows that the nvidia driver is installed:

[root@node46 ~]# lsmod | grep -i nvidia

nvidia 10765936 0

i2c_core 57537 3 i2c_ec,i2c_i801,nvidia

At this point I went into the NVIDIA_GPU_Computing_SDK/C directory and did a “make x86_64=1” to build all the samples. I then went into /NVIDIA_GPU_Computing_SDK/C/bin/linux/release and tried to run matrixMul as the SDK documentation suggested. However when I try running it I get:

[brucep@node46:~/release] ./matrixMul

[matrixMul] starting...

[ matrixMul ]

./matrixMul Starting (CUDA and CUBLAS tests)...

matrixMul.cu(83) : cudaSafeCall() Runtime API error 38: no CUDA-capable device is detected.

On a whim I decided to try the same thing as root and it appears to have run fine:

[root@node46 release]# ./matrixMul

[matrixMul] starting...

[ matrixMul ]

./matrixMul Starting (CUDA and CUBLAS tests)...

Device 0: "Tesla M2050" with Compute 2.0 capability

Using Matrix Sizes: A(640 x 960), B(640 x 640), C(640 x 960)

Runing Kernels...

> CUBLAS Throughput = 426.0426 GFlop/s, Time = 0.00185 s, Size = 786432000 Ops

> CUDA matrixMul Throughput = 187.0409 GFlop/s, Time = 0.00420 s, Size = 786432000 Ops, NumDevsUsed = 1, Workgroup = 1024

Comparing GPU results with Host computation...

Comparing CUBLAS & Host results

CUBLAS compares OK

Comparing CUDA matrixMul & Host results

CUDA matrixMul compares OK

[matrixMul] test results...

PASSED

Press ENTER to exit...

After I’ve successfully run it as root then the non-privileged account is able to run the app as well:

[brucep@node46:~/release] ./matrixMul

[matrixMul] starting...

[ matrixMul ]

./matrixMul Starting (CUDA and CUBLAS tests)...

Device 0: "Tesla M2050" with Compute 2.0 capability

Using Matrix Sizes: A(640 x 960), B(640 x 640), C(640 x 960)

...

I’ve verified that this is easily reproducible. If I reboot the server then non-privileged accounts will get an error that no CUDA devices are detected until the root account runs a CUDA app. After that it appears that non-privileged users are able to run CUDA apps without any problems.

What’s the reason for this? How can I get around it? I don’t want to have to set up something to run a CUDA job as root just to let other people run their own jobs…

-Bruce

If the node is not running X, you need to create the /dev/nvidia* entries.
Look at the Linux release notes, there is a script that does this.

Thanks. We’re not using X since these are just cluster compute nodes. I just read through /usr/share/doc/NVIDIA_GLX-1.0/README.txt and couldn’t find any mention of a script to set up the /dev entries. I also haven’t had any luck finding any such documentation in the CUDA toolkit or GPU SDK downloads. Can you point me to where to find this?

-Bruce

http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_Toolkit_Release_Notes.txt

Never mind. I finally found it buried in CUDA_Toolkit_Release_Notes.txt in the CUDA toolkit docs directory. Thanks again.

-Bruce