Can't get CUDA working Debian

Hi everybody,
I am not used to post on such forums but i’ve been trying to make CUDA work on my machine for days and can’t manage to do it… Well, almost.

I installed cuda successfully 2 weeks ago, but when i wanted to give it another try because i needed some stuff to be done, i wasn’t able to communicate with my GPU anymore. I tried everything, installed hundreds of times following different tutorials but nothing.
I also reinstalled my Linux still nothing. Here’s the procedure I am following and some informations.

I am running an x64 versions of Debian 8.5, I installed CUDA via apt-get install nvidia-cuda-toolkit which worked well. I then rebooted, i can compile sucessfully but cannot run it and got “error 38”. Which is from what i understood a problem communicating with the GPU. (as root)

nvidia-smi returns the following :

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.” (as root)

ls /dev/nvidia* returns :
/dev/nvidia0 /dev/nvidiactl

I also installed using the .run file made for ubuntu but had other strange error that i can’t remember. Could you guide me through the installation ?
I can provide all it needs to make it work… I don’t get why it’s not working correctly…

I would gladly appreciate any help

modinfo nvidia-current | grep version
version: 340.96
vermagic: 3.16.0-4-amd64 SMP mod_unload modversions

Config :
Laptop (MSI GE70)
 GTX 660M
ARch : x64
 OS Debian

follow the instructions in the nvidia linux install guide

You seem to be having a problem with the driver install

Following the install guide should install the correct driver along with the CUDA toolkit.

Just went through some more tests. At the moment I tried to install Bumblebee too and run the program through optirun, here’s the output :

root@ax:/home/ax/Bureau/AEP# optirun nvidia-smi
[i][ 377.074673] [ERROR]Cannot access secondary GPU - error: [XORG] (EE) /dev/dri/card0: failed to set DRM interface version 1.4: Permission denied

[ 377.074718] [ERROR]Aborting because fallback start is disabled.

I can find the following when running dmesg

[ 263.586679] bbswitch: enabling discrete graphics
[ 298.872021] nvidia: module license ‘NVIDIA’ taints kernel.
[ 298.872025] Disabling lock debugging due to kernel taint
[ 298.878711] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=none
[ 298.878976] [drm] Initialized nvidia-drm 0.0.0 20150116 for 0000:01:00.0 on minor 1
[ 298.878981] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 352.39 Fri Aug 14 18:09:10 PDT 2015
[ 298.880483] nvidia 0000:01:00.0: irq 49 for MSI/MSI-X
[ 298.883634] ACPI Warning: _SB_.PCI0.PEG0.PEGP.DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140424/nsarguments-95)
[ 298.883683] ACPI Warning: _SB
.PCI0.PEG0.PEGP.DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140424/nsarguments-95)
[ 298.883708] ACPI Warning: _SB
.PCI0.PEG0.PEGP.DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140424/nsarguments-95)
[ 298.883731] ACPI Warning: _SB
.PCI0.PEG0.PEGP.DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140424/nsarguments-95)
[ 298.883753] ACPI Warning: _SB
.PCI0.PEG0.PEGP.DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140424/nsarguments-95)
[ 298.883774] ACPI Warning: _SB
.PCI0.PEG0.PEGP.DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140424/nsarguments-95)
[ 298.883810] ACPI Warning: _SB
.PCI0.PEG0.PEGP.DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140424/nsarguments-95)
[ 298.883831] ACPI Warning: _SB
.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140424/nsarguments-95)
[ 304.275881] NVRM: failed to copy vbios to system memory.
[ 304.276560] NVRM: RmInitAdapter failed! (0x30:0xffff:851)
[ 304.276573] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 304.276608] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5

Any idea ? Seems that the driver cannot be loaded (and that i have some problem with ACPI, my battery is kinda old)

This depends on how you’ve installed the driver.

I am using the RUN files and not the Debian packages, because these are always a little bit behind.

The problem with the RUN installer is that whenever Debian updates Mesa, XOrg, kernel or some other package indirectly related to the driver then this can remove symlinks, libraries and even the driver’s kernel module. It’s important that you look at which packages are being updated whenever you run apt or any other package manager.

Quickest way to check if your driver is still alive and kicking is to open the Nvidia settings and to click through each page. An empty page for video or OpenGL is often an indicator for a Debian update having undone something.

So you simply rerun the Nvidia installer:

  • logout from the desktop
  • switch to the console with Ctrl-Alt-F1
  • login as root
  • stop the display manager with “service lightdm stop” or “service gdm stop” …
  • rerun the Nvidia RUN installer
  • start the display manager with “service lightdm start” or “service gdm start” …

You should see the window system coming up again.

If you are using the Nvidia driver from the Debian repository then you’ll probably have to reinstall the package. I don’t have any experience with those and I can’t really help you there.

If you’ve been using the RUN installer that came with CUDA then best don’t use it. Just get the latest driver from Nvidia’s diver download page. You’ll get a more recent driver this way.