No CUDAdevice is available, help

cudars · January 7, 2010, 2:53am

ubuntu 9.04
cuda driver 190.18
cuda toolkit 2.3
the installtion of cuda is ok
When I run the cuda sdk sample convolutionSeperable, got an error:
cudaSafeCall() Runtime API error in file <main.cpp>, line 215: no CUDAdevice is available.
other samples in sdk could not be executed either.
I wonder how can I deal with the problem.
Help!

PS: I do not install the X window, should I install it?

avidday · January 7, 2010, 7:12am

No, but that (at least indirectly) is the source of your problem. If you read the toolkit release notes you will see this:

o In order to run CUDA applications, the CUDA module must be

  loaded and the entries in /dev created.  This may be achieved

  by initializing X Windows, or by creating a script to load the

  kernel module and create the entries.

If you keep reading it will tell you what must be done…

cudars · January 7, 2010, 8:34am

No, but that (at least indirectly) is the source of your problem. If you read the toolkit release notes you will see this:
o In order to run CUDA applications, the CUDA module must be

  loaded and the entries in /dev created.  This may be achieved

  by initializing X Windows, or by creating a script to load the

  kernel module and create the entries.
If you keep reading it will tell you what must be done…

Thanks for helpping.

I read the toolkit release note and added the script to init which reside in initrd, then got a kernel panic.

where is the right place to add the script which provided in toolkit release note?

avidday · January 7, 2010, 8:44am

Add it to a script right at the end of the boot process, like rc.local or something. It doesn’t need to be done early, after all CUDA is userspace, and until you can login nothing it going to need it.

cudars · January 7, 2010, 9:58am

I add the script to the rc.local.

In my ubuntu system, lspci is located at /usr/bin, so I modify the script to

modprobe nvidia

if [ “$?” -eq 0 ]; then

Count the number of NVIDIA controllers found.

N3D=/usr/bin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l

NVGA=/usr/bin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l

N=expr $N3D + $NVGA - 1

for i in seq 0 $N; do

mknod -m 666 /dev/nvidia$i c 195 $i;

done

mknod -m 666 /dev/nvidiactl c 195 255

else

exit 1

fi

however, it seems that the script did not work.

I got the same error that is cudaSafeCall() Runtime API error in file <main.cpp>, line 215: no CUDAdevice is available.

ls | grep nvidia in /dev shows blank.

I wonder if there are some wrong ?

avidday · January 7, 2010, 10:10am

Run it by hand as root and see what it does.

cudars · January 7, 2010, 10:36am

I run the script , the system answered permisson denied.

suo I run the script with sudo do_script. then it works.

so it seems that rc.local should be run with sudo, but how can I achieve this goal?

avidday · January 7, 2010, 10:49am

It is run by root. You don’t have to change anything - all init scripts are run by process 0 (init, hence the name) as root. My guess is that is the script (at boot with a much more limited set of paths and environment variables) doesn’t find modprobe or mknod. Hard code the paths into those as you did with lspci and it should work.

cudars · January 7, 2010, 11:44am

I have Hard-coded the paths of modprobe andr mknod. But it didn’t help.

if I use sudo sh ./do_script. then it works.

It seems that rc.local did not run at root privilege.

what can i do?

avidday · January 7, 2010, 11:53am

For the second time: rc.local is run as root. Just because it doesn’t work doesn’t mean it isn’t being run as root. I have a whole cluster of headless, stateless compute nodes which set up CUDA this way. They have no operating system installed and download the OS image and set themselves up from scratch every time they are rebooted. It really does work.

Reboot into the original “no CUDA devices state” and then try running the exact rc.local script you have install by hand, as root and see what it does. Maybe even add some echo messages into the script to see where it gets to when it fails.

cudars · January 7, 2010, 12:28pm

I have to apologize for my fault.

I modified the /etc/rc.local instead of /etc/init.d/rc.local.

When I add the script to /etc/init.d/rc.local, everything is OK.

You are right, The init execute the rc.local at the right of root.

Thanks great for your help. You are so kind.

By the way, a whole cluster of headless, stateless compute nodes which has no operating system installed and download the OS image and set themselves up from scratch every time they are rebooted sounds great. could explain it briefly? hehe, just for curious. :rolleyes:

avidday · January 7, 2010, 12:39pm

Not much to tell really.

The arrangement is based on perceus and using a home made operating system image which include the CUDA drivers and runtime support. Nodes boot over gigabit ethernet using standard PXE and get sent a small PXEboot image containing a kernel and provisioning client for the Node to boot. Once up, the provisioning client then contacts an administrative server process running elsewhere on the network which sends down the running operating system image over the wire. It gets uncompressed into a ramdisk and bootstrapped. The rest of the image is served via NFS. The whole boot/setup process takes about 30 seconds. If you want to do an operating system reinstall or upgrade, just reboot…

cudars · January 7, 2010, 12:55pm

Thanks a lot.

You did a great job.

I know that Tiny Core Linux could do the similar task like your system.

hehe, Maybe I could learn your system some day.

Thanks again.