I’m in the process of making a Tesla PSC available to other users over a LAN. For the time being users connect via SSH however they complain that their code fails giving the following error:
cudaSafeCall() Runtime API error in file <scan.cu>, line 100 : no CUDA-capable device is available.
When logged in as the same user in GUI mode everything works so I’m guessing the relevant drivers don’t get loaded when connecting by SSH. Does anybody know how to fix this?
Thanks for the quick reply! I have the following in /dev
nvidia0
nvidia1
nvidia2
nvidiactl
You need to be root to run xdm so that’s not really an option. To which file should I add my version of the mknod commands you suggested? It might be better if I could add something to the log-in script when people connect because rebooting this machine isn’t really an option…
It is covered in the toolkit release notes. Add an init script to the boot sequence and it will happen automagically once at boot time and then you will never, ever have to worry about it again. If you run nvidia-smi in the same script in daemon mode with a modest polling interval, any compute exclusive settings you choose to add will also be retained forever.
I don’t know anything about Fedora, so I can’t answer where the commands go. For me (on Gentoo) it’s and init-script in /etc/init.d/cuda
It doesn’t seem to make sense to me that the device nodes should be created on user log-on. Run the commands once to solve the problem now, and add them to boot, so they problem gets a permanent solution.
I was thinking the same about xdm, do it on boot, don’t have users perform it.
IIRC we thought the nvidia-smi things looked good, but had some issues with it, and didn’t bother.
hmm… read the instructions? That could be an idea. Yes, it is indeed discussed in the toolkit release notes so I have my solution. Many thanks to you both for your quick responses…
I agree that it’s better to have it set up at boot time but like I mentioned in a previous post rebooting this machine is not an option any time soon which was why I was looking for some other way around in the short term until I can reboot. It’s all good though, this issue is resolved :)