I’ve tested the same system setup with a 331.49 driver, which returns 0 correctly.
My small piece of test code:
#include <stdio.h>
#include <dlfcn.h>
int main() {
void *cudalib = dlopen("libcuda.so", RTLD_NOW);
int (*__cuInit)(unsigned int) = (int(*)(unsigned int)) dlsym( cudalib, "cuInit" );
int retval = (*__cuInit)(0);
printf("%d", retval);
}
To test:
gcc -ldl test.c -o test
./test
Hmm, figured the main problem - you always have to run a cuda program as root for once, and afterwards, all cuda programs can be ran as regular user.
Even manually modprobe of nvidia_uvm could not fix this, I still have to run a program (the program above for example) as root once.
Any help will be really appreciated!
I recently updated to 334.21-1 on Arch Linux. Prior to the upgrade, the CUDA 5.5 samples all ran correctly as a normal user. Since the upgrade, deviceQueryDrv emits the following error message:
./deviceQueryDrv Starting...
CUDA Device Query (Driver API) statically linked version
/usr/bin/nvidia-modprobe: unrecognized option: "-u"
ERROR: Invalid commandline, please run `/usr/bin/nvidia-modprobe --help` for usage
information.
cuInit(0) returned 999
-> CUDA_ERROR_UNKNOWN
Result = FAIL
When running deviceQueryDrv as root, I get the following slightly different output:
./deviceQueryDrv Starting...
CUDA Device Query (Driver API) statically linked version
modprobe: FATAL: Module nvidia-uvm not found.
cuInit(0) returned 999
-> CUDA_ERROR_UNKNOWN
Result = FAIL
Of note here is the apparently missing nvidia-uvm kernel module. Other threads in this forum mention that this module is unused - perhaps this changed with 334.21-1?
works without sudo with cuda 6.0 rc
EDIT: runs without root the deviceQueryDrv sample
Just applied for Cuda developer access to get at the RC. I’ll reply back when I try it.
dbtx
6
The reason it works after running as root is root has the right to create a device node. Once it’s created, users can run programs-- but only because by default it’s owned by root, group root, world read/writable… seriously? I’m in Funtoo so I first added nvidia_uvm to /etc/conf.d/modules thus it’s always loaded but the node doesn’t get created. I also have a local script (/etc/local.d/nv_smi_pm.start) where I switch on persistent mode so I added these lines to it:
mknod -m 660 /dev/nvidia-uvm c 249 0
chgrp video /dev/nvidia-uvm
now everything works. I suppose you could write a proper udev rule but I’m not on that.
Update:
I just discovered nvidia-modprobe. If you run it as root:
nvidia-modprobe -c0 -u
it loads the module and creates the node just as it would be auto-created… the --help indicates it was meant to be setuid in order to work for everyone but package maintainers might have other ideas. Those default permissions are terribly DoS-happy.
The device node should really be created by nvidia-uvm module itself. I’ve made a wrong udev rule that works:
KERNEL=="nvidia_uvm", RUN+="/usr/bin/bash -c '/usr/bin/mknod -m 660 /dev/nvidia-uvm c $(grep nvidia-uvm /proc/devices | cut -d \ -f 1) 0; /usr/bin/chgrp video /dev/nvidia-uvm'"
Please, Nvidia, fix this!
I used a similar rule under Ubuntu 14.04, just ran into this after I decided to install driver 337.12 from xorg-edgers.
My first issue was that the kernel 3.12 patch to the uvm module was outdated in the xorg-edgers repo of the driver, so I kept getting a module build error… so I did the changes manually to the file and compiled with:
dkms install -m nvidia-337-uvm/337.12
Next, I realized that I had this issue that CUDA programs work only after sudo… so I tried the rule felixonmars posted, and for me it seems to need the 666 permissions, otherwise I still get the same issue. I also manually add nvidia & nvidia-uvm to /etc/modules and do an rm /dev/nvidia-uvm before I recreate it. I also don’t need the chgrp video line. Also, on Ubuntu 14.04 mknod and chgrp are in /bin, not /usr/bin
Just figured I’d add this here in case someone else is struggling with this…
bmerry
9
For anyone trying to figure out how to fix the patch failure: I just edited /usr/src/nvidia-337-uvm-337.12/dkms.conf and commented out the line
PATCH[0]="buildfix_kernel_3.12.patch"
and then run the dkms comment from comment #8.
I’m running saucy with a 3.11 kernel.