I have added a CUDA-based GPU kernel as an alternate computation path for CUDA-capable target systems. In trying to understand how this will work I have found that I seem to get the expected “device does not support cuda” when running the app on an older system with non-cuda-capable GPU. However, I have also encountered a case in which the program is seemingly forced into emulation mode “Using device 0: Device emulation (CPU)” on a system which should be CUDA-capable. Is this an incompatability between my program and the target system or an obvious indicator that it can not support a CUDA code?
The system in question includes a GeForce 9500 GT, runs open SUSE 11.1 x86_64 Linux, and contains /usr/lib64/libcuda.so.180.22.
I build the test app on a RHEL5.1 x86_64 system and copy the app, libcudart.so.2, libcufft.so.2 libs to the target system.
ldd indicates all .so references are being satisfied.
This program will be run on by various users on several different vintages of Linux, NVIDIA drivers, and GPUs over which I will have no control. So understanding how/why it elicits different responses is quite important.
What compiler did you build with?
I used nvcc “release 2.0, V0.2.1221”
gcc used was 4.1.2 20071124
Hmmmm… so if you used CUDA 2.0 to build, included all .so files, and did everything with the pathing right, you should have no problems. However, I remember one thing in my dealings with SuSE (haven’t used it in a long time, so grain of salt required)–check that /dev/nvidia* have 0666 permissions. I remember that I had to do this on SuSE but on no other distribution ever, so if you’re trying to do this over ssh that might be the problem.
You may well be right. The current permissions are only 660 . Unfortunately, I need to wait until someone with root privileges is available to change it.
However, I think you’ve answered my question already: what I did (builds, .so’s, ldd check, etc.) should have been enough to get a non-Emu execution.
I’ll definitely reply back, since your info should be published as it infers a general gotcha for using GPU’s remotely on SUSE.
Any idea what “/dev/nvidiactl” is as opposed to “/dev/nvidia0” ? On my RHEL5.1 build system I see 666 for nvidiaia0 but only 600 for nvidiactl – and that works fine, since I’m shown as the owner for /dev/nvidiactl.
/dev/nvidiaN is each card, while /dev/nvidiactl is the system-wide management interface (I think).
Turns out the incomplete permissions on /dev/nvidia* was only the first of two problems. In addition there was a missing symlink of libcuda.so to libcuda.so.1 in /usr/lib64 on the target system. (A little detective work with strace allowed me to find that.) No idea if the missing symlink is another SUSE-ism or operator error during the NVIDIA driver install on that system.
In any event it appears that for the target SUSE 11.1 system permissions need to be opened on the /dev/nvidia* devices and/or all users must be made a member of the “video” group. That would seem to be a general weakness with GPGPU useage thereon.