installation fails with kernels >= 5.1.x

While same installation runs fine and finishes successfully in kernel <5.1 (tested 5.0.17-200.fc29.x86_64)
It fails under >=5.1 (currently testing 5.1.15-200.fc29.x86_64)

[INFO]: 
[INFO]: ERROR: An error occurred while performing the step: "Checking to see whether the nvidia-uvm kernel module was successfully built". See /var/log/nvidia-installer.log for details.
[INFO]: 
[INFO]: 
[INFO]: ERROR: The nvidia-uvm kernel module was not created.
[INFO]: 
[INFO]: 
[INFO]: ERROR: The nvidia-uvm kernel module failed to build. This kernel module is required for the proper operation of CUDA. If you do not need to use CUDA, you can try to install this driver package again with the '--no-unified-memory' option.
[INFO]: 
[INFO]: 
[INFO]: ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
[INFO]: 
[INFO]: The command `cd ./kernel; /usr/bin/make -k -j24 NV_KERNEL_MODULES="nvidia-uvm" NV_EXCLUDE_KERNEL_MODULES="" SYSSRC="/lib/modules/5.1.15-200.fc29.x86_64/source" SYSOUT="/lib/modules/5.1.15-200.fc29.x86_64/build"` failed with the following output:
[INFO]: 
[INFO]: make[1]: Entering directory '/usr/src/kernels/5.1.15-200.fc29.x86_64'
[INFO]: make[2]: Entering directory '/usr/src/kernels/5.1.15-200.fc29.x86_64'
[INFO]:   CC [M]  /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.o
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:187:14: error: initialization of ‘vm_fault_t (*)(struct vm_fault *)’ {aka ‘unsigned int (*)(struct vm_fault *)’} from incompatible pointer type ‘int (*)(struct vm_fault *)’ [-Werror=incompatible-pointer-types]
[INFO]:      .fault = uvm_vm_fault_sigbus_wrapper
[INFO]:               ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:187:14: note: (near initialization for ‘uvm_vm_ops_disabled.fault’)
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:525:21: error: initialization of ‘vm_fault_t (*)(struct vm_fault *)’ {aka ‘unsigned int (*)(struct vm_fault *)’} from incompatible pointer type ‘int (*)(struct vm_fault *)’ [-Werror=incompatible-pointer-types]
[INFO]:      .fault        = uvm_vm_fault_wrapper,
[INFO]:                      ^~~~~~~~~~~~~~~~~~~~
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:525:21: note: (near initialization for ‘uvm_vm_ops_managed.fault’)
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:526:21: error: initialization of ‘vm_fault_t (*)(struct vm_fault *)’ {aka ‘unsigned int (*)(struct vm_fault *)’} from incompatible pointer type ‘int (*)(struct vm_fault *)’ [-Werror=incompatible-pointer-types]
[INFO]:      .page_mkwrite = uvm_vm_fault_wrapper,
[INFO]:                      ^~~~~~~~~~~~~~~~~~~~
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:526:21: note: (near initialization for ‘uvm_vm_ops_managed.page_mkwrite’)
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:609:21: error: initialization of ‘vm_fault_t (*)(struct vm_fault *)’ {aka ‘unsigned int (*)(struct vm_fault *)’} from incompatible pointer type ‘int (*)(struct vm_fault *)’ [-Werror=incompatible-pointer-types]
[INFO]:      .fault        = uvm_vm_fault_sigbus_wrapper,
[INFO]:                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:609:21: note: (near initialization for ‘uvm_vm_ops_semaphore_pool.fault’)
[INFO]: cc1: warning: unrecognized command line option ‘-Wno-address-of-packed-member’
[INFO]: cc1: some warnings being treated as errors
[INFO]: make[3]: *** [/usr/src/kernels/5.1.15-200.fc29.x86_64/scripts/Makefile.build:275: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.o] Error 1
[INFO]: make[3]: Target '__build' not remade because of errors.
[INFO]: make[2]: *** [/usr/src/kernels/5.1.15-200.fc29.x86_64/Makefile:1575: _module_/tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel] Error 2
[INFO]: make[2]: Target 'modules' not remade because of errors.
[INFO]: make[2]: Leaving directory '/usr/src/kernels/5.1.15-200.fc29.x86_64'
[INFO]: make[1]: *** [Makefile:169: sub-make] Error 2
[INFO]: make[1]: Target 'modules' not remade because of errors.
[INFO]: make[1]: Leaving directory '/usr/src/kernels/5.1.15-200.fc29.x86_64'
[INFO]: make: *** [Makefile:81: modules] Error 2
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 418.67 failed, quitting

This is a fairly common occurrence. The driver that gets built into the installer represents our best knowledge and tested capability at the time. But things change.

When you have a driver install problem like this with an older driver – regardless of source, the usual advice is to try the latest driver. For example, R430 drivers are available now.

If you find that an R430 (or whatever is the latest) driver can install successfully, then use it. It will work with any current version of CUDA. Furthermore, NVIDIA is highly unlikely, based on historical behavior, to go back and “fix” or update an old driver, or a driver in an “old” CUDA package, to address such things.

If you find that you cannot install the latest driver on the late model kernel you are using, a suggested thing to do then would be to file a bug. (And of course you can post about it.)

Thanks Robert for reply!

Just wanted to ask as I am little bit confused:

How can I obtain newer CUDA driver installation then ?

Is there any reason why old and unsupported drivers are being provided on an official pages? :)

https://www.nvidia.com/Download/index.aspx?lang=en-us
Is just NVIDIA driver (without CUDA) imho. Is there a process of installing Nvidia drivers separately and then CUDA drivers separately that I could follow ? (if that is the case)

http://www.nvidia.com/drivers

You may call it whatever you wish. It certainly does not have CUDA included, since it is just drivers. But it will work fine with any version of CUDA, as I’ve stated already. It’s OK if you don’t believe me.
Regardless of what you choose to call it, it is what I was referring to when I referred to newer drivers and R430 drivers.

They are only “unsupported” when used outside of the defined support for CUDA, which is outlined in the CUDA linux install guide:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

Read table 1. There is no kernel 5.x listed as supported anywhere in that table.

Your configuration with that kernel is unsupported by any version of CUDA, and this is plainly documented.

clear! thanks for explanation!

Sorry, I never said i don’t believe you, I was just confused. All good!

These are condensed instructions to use a separately installed driver.

  1. Find the driver you wish to use from the source indicated above.
  2. install it
  3. Run the CUDA toolkit installer of your choice:
  • if package manager, replace the instructions to install cuda with install cuda-toolkit or similar
  • if runfile installer, manually deselect the option (e.g. answer “no”, or similar) to install the driver

It’s always a good idea to be thoroughly familiar with the contents of the linux install guide I linked above.

Awesome, thanks for help Robert!

Updated with 430.26 and all works under kernel 5.1

Had to temporarily change to multi-user target, otherwise i was getting frozen boot screen. (even switching to different F2-12 console did not work)

systemctl set-default multi-user.target

I’ve re-run cuda 10.1 installer with driver installation unchecked. It finished with an error (I couldn’t find anything meaningful inside logs, besides millions of messages

[ERROR]: boost::filesystem::remove: Directory not empty: "/var/log/nvidia/.uninstallManifests/"

Anyway, cuda either got updated or works from previous installation, which is great.

My initial confusion came from (wrongly) thinking that CUDA driver is somehow different driver than the “normal one”. Same idea that we had(have?) quadro OpenGL drivers and Nvidia drivers as a separate thing. But knowing that Cuda is just an additional set of libraries that can work together with ‘default’ NVidia drivers helps and all works just fine.

Cheers!