334.21 driver returns 999 on cuInit (CUDA)

felixonmars · March 4, 2014, 4:35am

I’ve tested the same system setup with a 331.49 driver, which returns 0 correctly.

My small piece of test code:

#include <stdio.h>
#include <dlfcn.h>
int main() {
  void *cudalib = dlopen("libcuda.so", RTLD_NOW);
  int (*__cuInit)(unsigned int) = (int(*)(unsigned int)) dlsym( cudalib, "cuInit" );
  int retval = (*__cuInit)(0);
  printf("%d", retval);
}

To test:

gcc -ldl test.c -o test
./test

felixonmars · March 4, 2014, 10:00am

Hmm, figured the main problem - you always have to run a cuda program as root for once, and afterwards, all cuda programs can be ran as regular user.

Even manually modprobe of nvidia_uvm could not fix this, I still have to run a program (the program above for example) as root once.

Any help will be really appreciated!

rdahlgren · March 4, 2014, 7:20pm

I recently updated to 334.21-1 on Arch Linux. Prior to the upgrade, the CUDA 5.5 samples all ran correctly as a normal user. Since the upgrade, deviceQueryDrv emits the following error message:

./deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version 
/usr/bin/nvidia-modprobe: unrecognized option: "-u"

ERROR: Invalid commandline, please run `/usr/bin/nvidia-modprobe --help` for usage
       information.

cuInit(0) returned 999
-> CUDA_ERROR_UNKNOWN
Result = FAIL

When running deviceQueryDrv as root, I get the following slightly different output:

./deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version 
modprobe: FATAL: Module nvidia-uvm not found.
cuInit(0) returned 999
-> CUDA_ERROR_UNKNOWN
Result = FAIL

Of note here is the apparently missing nvidia-uvm kernel module. Other threads in this forum mention that this module is unused - perhaps this changed with 334.21-1?

sL1pKn07 · March 4, 2014, 8:05pm

works without sudo with cuda 6.0 rc

EDIT: runs without root the deviceQueryDrv sample

rdahlgren · March 4, 2014, 9:22pm

Just applied for Cuda developer access to get at the RC. I’ll reply back when I try it.

dbtx · March 9, 2014, 8:39am

The reason it works after running as root is root has the right to create a device node. Once it’s created, users can run programs-- but only because by default it’s owned by root, group root, world read/writable… seriously? I’m in Funtoo so I first added nvidia_uvm to /etc/conf.d/modules thus it’s always loaded but the node doesn’t get created. I also have a local script (/etc/local.d/nv_smi_pm.start) where I switch on persistent mode so I added these lines to it:

mknod -m 660 /dev/nvidia-uvm c 249 0
chgrp video /dev/nvidia-uvm

now everything works. I suppose you could write a proper udev rule but I’m not on that.

Update:
I just discovered nvidia-modprobe. If you run it as root:

nvidia-modprobe -c0 -u

it loads the module and creates the node just as it would be auto-created… the --help indicates it was meant to be setuid in order to work for everyone but package maintainers might have other ideas. Those default permissions are terribly DoS-happy.

felixonmars · March 27, 2014, 3:46pm

The device node should really be created by nvidia-uvm module itself. I’ve made a wrong udev rule that works:

KERNEL=="nvidia_uvm", RUN+="/usr/bin/bash -c '/usr/bin/mknod -m 660 /dev/nvidia-uvm c $(grep nvidia-uvm /proc/devices | cut -d \  -f 1) 0; /usr/bin/chgrp video /dev/nvidia-uvm'"

Please, Nvidia, fix this!

vacaloca · April 11, 2014, 5:19am

I used a similar rule under Ubuntu 14.04, just ran into this after I decided to install driver 337.12 from xorg-edgers.

My first issue was that the kernel 3.12 patch to the uvm module was outdated in the xorg-edgers repo of the driver, so I kept getting a module build error… so I did the changes manually to the file and compiled with:

dkms install -m nvidia-337-uvm/337.12

Next, I realized that I had this issue that CUDA programs work only after sudo… so I tried the rule felixonmars posted, and for me it seems to need the 666 permissions, otherwise I still get the same issue. I also manually add nvidia & nvidia-uvm to /etc/modules and do an rm /dev/nvidia-uvm before I recreate it. I also don’t need the chgrp video line. Also, on Ubuntu 14.04 mknod and chgrp are in /bin, not /usr/bin

Just figured I’d add this here in case someone else is struggling with this…

bmerry · April 11, 2014, 1:14pm

For anyone trying to figure out how to fix the patch failure: I just edited /usr/src/nvidia-337-uvm-337.12/dkms.conf and commented out the line

PATCH[0]="buildfix_kernel_3.12.patch"

and then run the dkms comment from comment #8.

I’m running saucy with a 3.11 kernel.

Topic		Replies	Views
Cuda driver fails start until root start a cuda program Cuda Freakness... CUDA Programming and Performance	1	1292	June 4, 2010
CUDA has to be started as 'root' at least once to work properly Linux	3	5880	March 7, 2014
Install Problem CUDA Programming and Performance	32	12905	December 17, 2009
CUDA, Linux Ubuntu 10.04 and strange mismatch version CUDA Programming and Performance	26	19266	November 18, 2010
CUDA Works as root but not as user on OpenSUSE 10.3 CUDA Programming and Performance	5	7955	November 6, 2008
After Reboot, Unprivileged Users Can not run CUDA programs deviceQuery prints cudaGetDeviceCount FAI CUDA Programming and Performance	3	13010	November 9, 2010
Ubuntu 12.04. Error: cudaGetDeviceCount returned 30 CUDA Setup and Installation	9	41629	October 18, 2017
Ubuntu 9.04 - Cuda 2.3 - no device supporting CUDA SLI GTX cards are not recognized by cuda runtime CUDA Programming and Performance	14	22128	October 23, 2009
deviceQuery CUDA Programming and Performance	4	10247	December 15, 2009
/dev/nvidiactl not found CUDA Programming and Performance	10	35378	July 7, 2008

334.21 driver returns 999 on cuInit (CUDA)

Related topics