Toubles at UBUNTU update

Hi

On 23rd of Aug. 2016 I got an nvidia package update via the ubuntu default package management system.

After this update the X-Windows System did not work anymore.

BUT - one after the other:

Used Hardware:

uname -a
Linux studio16 4.2.0-42-lowlatency #49-Ubuntu SMP PREEMPT Tue Jun 28 23:12:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

The main Problem:

part of /var/log/kern.log:

NVRM: API mismatch: the client has the version 352.99, but
NVRM: this kernel module has the version 352.93.  Please
NVRM: make sur that this kernel module and all NVIDIA driver
NVRM: components have the same version.
NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22

This ment to me: the installation was not able to clean up all old code

  • specially the kernel module - to work with the new version 352.99 .

So I cleaned up all cuda and nvidia packages and installed
ONLY nvidia:

dpkg --list | fgrep nvidia
ii nvidia-352               352.99-0ubuntu1         amd64    NVIDIA binary driver - version 352.99
ii nvidia-opencl-icd-352    352.99-0ubuntu1         amd64    NVIDIA OpenCL ICD
ii nvidia-prime             0.8.1                   amd64    Tools to enable NVIDIA's Prime
ii nvidia-settings          352.99-0ubuntu1         amd64    Tool for configureing the NVIDIA graphics driver

there are additional packages with 352.99 like:

ii libcuda1-352             352.99-0ubuntu1         amd64    NVIDIA CUDA runtime library
ii libxnvctrl0              352.99-0ubuntu1         amd64    NV-CONTROL X extension (runtime library)

In this configuration I get

nvidia: module license 'NVIDIA' taints kernel.
nvidia: module verification failed: signature and/or required key missing - tainting kernel

in /var/log/kern.log.

Afterwards only messages of this type are recorded:

NVRM: RmInitAdapter failed! (0x2d:0x63:1406)
NVRM: rm_init_adapter failed for device bearing minor number 0
NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5

What is the problem - that the installation of the package is not able to
be installed in a correct - working way?

br
Rainer

follow the instructions in the linux installation guide to complete remove the old driver installation, then reinstall whichever driver you wish

What is the real:

„linux installation guide to complete remove the old driver installation“?

Do you mean:

http://developer.download.nvidia.com/compute/cuda/7.5/Prod/docs/sidebar/CUDA_Installation_Guide_Linux.pdf

4.6 Uninstallation
NO /usr/bin/nvidia-uninstall script / program available!

Or
http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/#axzz4IjXKVA86
or
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#axzz4IjXKVA86

or
https://forum.ubuntuusers.de/topic/nvidia-treiber-deinstallieren-und-ursprungszu/

removing all nvidia* Packages is not enought?

… and the second advice:

„then reinstall whichever driver you wish“

Is was not really a wish from myside – the UBUNTU updater asks to update the drivers automatically, and THEN the System did NOT work anymore!
I want to have a secure system – and try to hold my drivers up to date.
(But I did NOT wish that the system will not work anymore!)

The UBUNTU Community (?) rolled out a 352.99 version – and the system had troubles to work with the 352.93 kernel modules.
Had the system troubles to generate the kernel modules during the installation process?

I do not know which is the correct last or correct version!

Thank you for some additional Information
br
Rainer

I mean the first document you listed:

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#abstract

And there is no uninstall script because you did not use a runfile installer to install the driver. You used the package manager method.

Instructions are given in the “handle conflicting install methods” section of that document to clean up (remove) either a runfile installer installation or a package manager installation.

If you accept whatever CUDA or driver updates the Ubuntu community pushes onto your machine, you may well break the CUDA installation.

You can certainly accept most updates; they should not be relevant. But CUDA, or CUDA drivers, or GPU drivers, or things like kernel updates are items that need to be handled correctly.

Due
2.5. Choose an Installation Method (of cuda-installation-guide-linux)
I did:
It is recommended to use the distribution-specific packages, where possible.

(with hope, that further releases are upgrading correctly).

Due the first problem message in /var/log/kern.log

352.93 was installed - and the update to 352.99 was not working any longer.

I 352.99 I found did not find my GTX670 GPU. (How could that have worked with 352.93?)

… that was the old trouble
NOW I cleaned up all kernel modules,
and installed 367.44 via run-method.

1st: nvidia-smi does know the GPU:

Thu Sep  1 23:31:42 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 670     Off  | 0000:01:00.0     N/A |                  N/A |
| 28%   33C    P8    N/A /  N/A |     77MiB /  1991MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+

With this version the X-Server + Browser run again - to copy and paste.

BUT:

/var/log/kern.log tells:

Sep  1 10:33:45 studio16 kernel: [    8.031387] nvidia: module license 'NVIDIA' taints kernel.
Sep  1 10:33:45 studio16 kernel: [    8.031390] Disabling lock debugging due to kernel taint
Sep  1 10:33:45 studio16 kernel: [    8.033959] nvidia: module verification failed: signature and/or required key missing - tainting kernel
Sep  1 10:33:45 studio16 kernel: [    8.037033] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=io+mem
Sep  1 10:33:45 studio16 kernel: [    8.037082] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
Sep  1 10:33:45 studio16 kernel: [    8.037089] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  367.44  Wed Aug 17 22:24:07 PDT 2016
Sep  1 10:33:45 studio16 kernel: [    8.057473] rtl8192cu: Chip version 0x10
Sep  1 10:33:45 studio16 kernel: [    8.064117] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  367.44  Wed Aug 17 21:54:40 PDT 2016
Sep  1 10:33:45 studio16 kernel: [    8.071816] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver

on one hand - signature veritication failed
on the other hand - NVRM: loading NVIDIA - Module

Sep  1 09:38:02 studio16 kernel: [   16.293965] Request for unknown module key 'nvidia-installer generated signing key: dabac2e40728e77c56bed39ab97623e3a023a052' err -11
Sep  1 09:38:02 studio16 kernel: [   16.295174] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 245

Sep  1 10:33:59 studio16 kernel: [   28.727925] nvidia-modeset: Allocated GPU:0 (GPU-a0b37da0-9b40-61c3-4fc7-7b026ea0bbb5) @ PCI:0000:01:00.0

on one hand - signing key error
on the other hand - loaded UVM driver

AND the following command will NOT be executed.

$ nvidia-docker run --rm nvidia/cuda nvidia-smi
docker: Error response from daemon: create nvidia_driver_367.44: VolumeDriver.Create: internal error, check logs for details.
See 'docker run --help'.

Is the kernel modul signed wrong with *run - Installer?

  • What’s the real matter?

Which is the correct version for GTX670 GPU and UBUNTU 15.04?

If I would run the cuda installer - it would install a 304 Version.
Is that the correct one?

I need only a NVIDIA driver on the host system
to use cuda etc in docker container.

I found additional Information in

http://us.download.nvidia.com/XFree86/Linux-x86_64/352.99/README/index.html

But I can’t understand why in most pages

Version 352.93 is used like in:

http://us.download.nvidia.com/XFree86/Linux-x86_64/352.99/README/selectdriver.html

The NVIDIA graphics driver is bundled in a self-extracting package named NVIDIA-Linux-x86_64-352.93.run.

NO updates were done for the new version?