Samples not working due to (erroneous?) insufficient driver version in Debian Jessie with Optimus ca

Hi everyone,
Today I successfully managed to install the CUDA 6.5 Toolkit on Debian Jessie. Unfortunately, the samples do not work, even though they compile without problems. I believe I have a correct driver version (I have 340.65 and I think the 6.5 toolkit requires a minimum driver version of 340.21):

kmdouglass@kmd-laptop1:~/NVIDIA_CUDA-6.5_Samples/bin/x86_64/linux/release$ optirun ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

I have not been able to find much help online so far because this appears to be a common error that may or may not be related to Optimus cards. Would anyone be able to help me with this problem?

I am using the version 340.65-2 driver from the Debian non-free repository, which is higher than the driver version provided with the CUDA 6.5.14 .run file:

kmdouglass@kmd-laptop1:~/src$ optirun sudo cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module  340.65  Tue Dec  2 09:50:34 PST 2014
GCC version:  gcc version 4.8.4 (Debian 4.8.4-1)

I actually compiled the samples with gcc 4.8.2 since anything higher is not supported by the 6.5 toolkit.

I use the optirun command because I am using bumblebee (http://bumblebee-project.org/) with my Optimus card.

kmdouglass@kmd-laptop1:~/src$ sudo aptitude versions bumblebee
Package bumblebee:                        
i A 3.2.1-7                                       stable                    995 

Package bumblebee-dbg:
p   3.2.1-7                                       stable                    995 

Package bumblebee-nvidia:
i   3.2.1-7                                       stable                    995

GPU is a GeForce GT 555M:

kmdouglass@kmd-laptop1:~/src$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GF108M [GeForce GT 555M] (rev a1)

Linux and Debian version:

kmdouglass@kmd-laptop1:~/src$ uname -a
Linux kmd-laptop1 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u5 (2015-10-09) x86_64 GNU/Linux
kmdouglass@kmd-laptop1:~/src$ cat /etc/debian_version 
stretch/sid

nvidia-smi output

kmdouglass@kmd-laptop1:~/src$ nvidia-smi 
Sat Nov  7 22:49:58 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 340.65     Driver Version: 340.65         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 555M     Off  | 0000:01:00.0     N/A |                  N/A |
| N/A   50C    P0    N/A /  N/A |      6MiB /  1023MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
+-----------------------------------------------------------------------------+

Output from “optirun glxheads,” which successfully displays the rotating triangle:

kmdouglass@kmd-laptop1:~/src$ optirun glxheads
glxheads: exercise multiple GLX connections (any key = exit)
Usage:
  glxheads xdisplayname ...
Example:
  glxheads :0 mars:0 venus:1
Name: :0.0
  Display:     0x1892060
  Window:      0x3800002
  Context:     0x198d860
  GL_VERSION:  4.4.0 NVIDIA 340.65
  GL_VENDOR:   NVIDIA Corporation
  GL_RENDERER: GeForce GT 555M/PCIe/SSE2

PATH and LD_LIBRARY_PATH environment variables:

kmdouglass@kmd-laptop1:~/src$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/local/ImageJ/:/usr/local/cuda-6.5/bin:/usr/local/ImageJ/:/usr/local/cuda-6.5/bin
kmdouglass@kmd-laptop1:~/src$ echo $LD_LIBRARY_PATH 
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server:/usr/lib/jvm/java-7-openjdk-amd64:/usr/lib/jvm/java-7-openjdk-amd64/include:/usr/local/cuda-6.5/lib64

Bumblebee configuration:

# Configuration file for Bumblebee. Values should **not** be put between quotes

## Server options. Any change made in this section will need a server restart
# to take effect.
[bumblebeed]
# The secondary Xorg server DISPLAY number
VirtualDisplay=:8
# Should the unused Xorg server be kept running? Set this to true if waiting
# for X to be ready is too long and don't need power management at all.
KeepUnusedXServer=false
# The name of the Bumbleblee server group name (GID name)
ServerGroup=bumblebee
# Card power state at exit. Set to false if the card shoud be ON when Bumblebee
# server exits.
TurnCardOffAtExit=false
# The default behavior of '-f' option on optirun. If set to "true", '-f' will
# be ignored.
NoEcoModeOverride=false
# The Driver used by Bumblebee server. If this value is not set (or empty),
# auto-detection is performed. The available drivers are nvidia and nouveau
# (See also the driver-specific sections below)
Driver=
# Directory with a dummy config file to pass as a -configdir to secondary X
XorgConfDir=/etc/bumblebee/xorg.conf.d

## Client options. Will take effect on the next optirun executed.
[optirun]
# Acceleration/ rendering bridge, possible values are auto, virtualgl and
# primus.
Bridge=auto
# The method used for VirtualGL to transport frames between X servers.
# Possible values are proxy, jpeg, rgb, xv and yuv.
VGLTransport=proxy
# List of paths which are searched for the primus libGL.so.1 when using
# the primus bridge
PrimusLibraryPath=/usr/lib/x86_64-linux-gnu/primus:/usr/lib/i386-linux-gnu/primus:/usr/lib/primus:/usr/lib32/primus
# Should the program run under optirun even if Bumblebee server or nvidia card
# is not available?
AllowFallbackToIGC=false

# Driver-specific settings are grouped under [driver-NAME]. The sections are
# parsed if the Driver setting in [bumblebeed] is set to NAME (or if auto-
# detection resolves to NAME).
# PMMethod: method to use for saving power by disabling the nvidia card, valid
# values are: auto - automatically detect which PM method to use
#         bbswitch - new in BB 3, recommended if available
#       switcheroo - vga_switcheroo method, use at your own risk
#             none - disable PM completely
# https://github.com/Bumblebee-Project/Bumblebee/wiki/Comparison-of-PM-methods

## Section with nvidia driver specific options, only parsed if Driver=nvidia
[driver-nvidia]
# Module name to load, defaults to Driver if empty or unset
KernelDriver=nvidia-current
PMMethod=auto
# colon-separated path to the nvidia libraries
LibraryPath=/usr/lib/x86_64-linux-gnu/nvidia:/usr/lib/i386-linux-gnu/nvidia:/usr/lib/nvidia
# comma-separated path of the directory containing nvidia_drv.so and the
# default Xorg modules path
XorgModulePath=/usr/lib/nvidia,/usr/lib/xorg/modules
XorgConfFile=/etc/bumblebee/xorg.conf.nvidia

## Section with nouveau driver specific options, only parsed if Driver=nouveau
[driver-nouveau]
KernelDriver=nouveau
PMMethod=auto
XorgConfFile=/etc/bumblebee/xorg.conf.nouveau

bumblebee/xorg.conf.nvidia:

Section "ServerLayout"
    Identifier  "Layout0"
    Option      "AutoAddDevices" "false"
    Option      "AutoAddGPU" "false"
EndSection

Section "Device"
    Identifier  "DiscreteNvidia"
    Driver      "nvidia"
    VendorName  "NVIDIA Corporation"

#   If the X server does not automatically detect your VGA device,
#   you can manually set it here.
#   To get the BusID prop, run `lspci | egrep 'VGA|3D'` and input the data
#   as you see in the commented example.
#   This Setting may be needed in some platforms with more than one
#   nvidia card, which may confuse the proprietary driver (e.g.,
#   trying to take ownership of the wrong device). Also needed on Ubuntu 13.04.
#   BusID "PCI:01:00:0"

#   Setting ProbeAllGpus to false prevents the new proprietary driver
#   instance spawned to try to control the integrated graphics card,
#   which is already being managed outside bumblebee.
#   This option doesn't hurt and it is required on platforms running
#   more than one nvidia graphics card with the proprietary driver.
#   (E.g. Macbook Pro pre-2010 with nVidia 9400M + 9600M GT).
#   If this option is not set, the new Xorg may blacken the screen and
#   render it unusable (unless you have some way to run killall Xorg).
    Option "ProbeAllGpus" "false"

    Option "NoLogo" "true"
    Option "UseEDID" "false"
    Option "UseDisplayDevice" "none"
EndSection

I’d be happy to provide any other information if it’s needed.

Cheers,
Kyle

UPDATE: I have also tried installing the Cuda 6.0.37 toolkit using the same steps, but still receive the same error:

kmdouglass@kmd-laptop1:~/NVIDIA_CUDA-6.0_Samples/bin/x86_64/linux/release$ optirun ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

Briefly, I first downloaded and compiled gcc 4.7.3 and used it when running cuda_6.0.37_linux_64.run. (I did not install the included driver but am still using the one from the Debian package manager listed above.) The samples compiled without error using the included Makefile.

I’m guessing that this is an error that is not directly related to the driver version. Does anyone have any suggestions?

Cheers,
Kyle

SECOND UPDATE: I removed all my toolkit installs from Nvidia’s .run files and instead installed the nvidia-cuda-toolkit Debian packages:

kmdouglass@kmd-laptop1:~/src$ sudo apt-get install nvidia-cuda-toolkit 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following extra packages will be installed:
  libcuda1 libcuinj64-6.0 libnvcuvid1 libnvidia-compiler nvidia-cuda-dev
  nvidia-cuda-doc nvidia-cuda-gdb nvidia-libopencl1 nvidia-opencl-common
  nvidia-opencl-dev nvidia-opencl-icd nvidia-profiler nvidia-visual-profiler
Suggested packages:
  nvidia-cuda-mps libcupti-dev
Recommended packages:
  libcuda1-i386 libvdpau-dev
The following NEW packages will be installed:
  libcuda1 libcuinj64-6.0 libnvcuvid1 libnvidia-compiler nvidia-cuda-dev
  nvidia-cuda-doc nvidia-cuda-gdb nvidia-cuda-toolkit nvidia-libopencl1
  nvidia-opencl-common nvidia-opencl-dev nvidia-opencl-icd nvidia-profiler
  nvidia-visual-profiler

After installation, I successfully got deviceQuery to run, and I didn’t even need optirun:

kmdouglass@kmd-laptop1:~/NVIDIA_CUDA-6.0_Samples/bin/x86_64/linux/release$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 555M"
  CUDA Driver Version / Runtime Version          6.5 / 6.0
  CUDA Capability Major/Minor version number:    2.1
  Total amount of global memory:                 1024 MBytes (1073414144 bytes)
  ( 2) Multiprocessors, ( 48) CUDA Cores/MP:     96 CUDA Cores
  GPU Clock rate:                                1505 MHz (1.50 GHz)
  Memory Clock rate:                             1570 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 131072 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = GeForce GT 555M
Result = PASS

So I’m guessing now that some of the libraries were not correctly located from the .run files, but the package manager was able to set them up properly.

It would still be nice to know what the problem was exactly, though…

I’m not sure what the problem is exactly, but you cannot mix runfile installation and package manager installation. They are incompatible with each other. This is covered in the linux installation guide.