installation fails with kernels >= 5.1.x

tmdag · July 4, 2019, 1:40am

While same installation runs fine and finishes successfully in kernel <5.1 (tested 5.0.17-200.fc29.x86_64)
It fails under >=5.1 (currently testing 5.1.15-200.fc29.x86_64)

[INFO]: 
[INFO]: ERROR: An error occurred while performing the step: "Checking to see whether the nvidia-uvm kernel module was successfully built". See /var/log/nvidia-installer.log for details.
[INFO]: 
[INFO]: 
[INFO]: ERROR: The nvidia-uvm kernel module was not created.
[INFO]: 
[INFO]: 
[INFO]: ERROR: The nvidia-uvm kernel module failed to build. This kernel module is required for the proper operation of CUDA. If you do not need to use CUDA, you can try to install this driver package again with the '--no-unified-memory' option.
[INFO]: 
[INFO]: 
[INFO]: ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
[INFO]: 
[INFO]: The command `cd ./kernel; /usr/bin/make -k -j24 NV_KERNEL_MODULES="nvidia-uvm" NV_EXCLUDE_KERNEL_MODULES="" SYSSRC="/lib/modules/5.1.15-200.fc29.x86_64/source" SYSOUT="/lib/modules/5.1.15-200.fc29.x86_64/build"` failed with the following output:
[INFO]: 
[INFO]: make[1]: Entering directory '/usr/src/kernels/5.1.15-200.fc29.x86_64'
[INFO]: make[2]: Entering directory '/usr/src/kernels/5.1.15-200.fc29.x86_64'
[INFO]:   CC [M]  /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.o
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:187:14: error: initialization of ‘vm_fault_t (*)(struct vm_fault *)’ {aka ‘unsigned int (*)(struct vm_fault *)’} from incompatible pointer type ‘int (*)(struct vm_fault *)’ [-Werror=incompatible-pointer-types]
[INFO]:      .fault = uvm_vm_fault_sigbus_wrapper
[INFO]:               ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:187:14: note: (near initialization for ‘uvm_vm_ops_disabled.fault’)
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:525:21: error: initialization of ‘vm_fault_t (*)(struct vm_fault *)’ {aka ‘unsigned int (*)(struct vm_fault *)’} from incompatible pointer type ‘int (*)(struct vm_fault *)’ [-Werror=incompatible-pointer-types]
[INFO]:      .fault        = uvm_vm_fault_wrapper,
[INFO]:                      ^~~~~~~~~~~~~~~~~~~~
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:525:21: note: (near initialization for ‘uvm_vm_ops_managed.fault’)
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:526:21: error: initialization of ‘vm_fault_t (*)(struct vm_fault *)’ {aka ‘unsigned int (*)(struct vm_fault *)’} from incompatible pointer type ‘int (*)(struct vm_fault *)’ [-Werror=incompatible-pointer-types]
[INFO]:      .page_mkwrite = uvm_vm_fault_wrapper,
[INFO]:                      ^~~~~~~~~~~~~~~~~~~~
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:526:21: note: (near initialization for ‘uvm_vm_ops_managed.page_mkwrite’)
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:609:21: error: initialization of ‘vm_fault_t (*)(struct vm_fault *)’ {aka ‘unsigned int (*)(struct vm_fault *)’} from incompatible pointer type ‘int (*)(struct vm_fault *)’ [-Werror=incompatible-pointer-types]
[INFO]:      .fault        = uvm_vm_fault_sigbus_wrapper,
[INFO]:                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[INFO]: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.c:609:21: note: (near initialization for ‘uvm_vm_ops_semaphore_pool.fault’)
[INFO]: cc1: warning: unrecognized command line option ‘-Wno-address-of-packed-member’
[INFO]: cc1: some warnings being treated as errors
[INFO]: make[3]: *** [/usr/src/kernels/5.1.15-200.fc29.x86_64/scripts/Makefile.build:275: /tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel/nvidia-uvm/uvm8.o] Error 1
[INFO]: make[3]: Target '__build' not remade because of errors.
[INFO]: make[2]: *** [/usr/src/kernels/5.1.15-200.fc29.x86_64/Makefile:1575: _module_/tmp/selfgz8069/NVIDIA-Linux-x86_64-418.67/kernel] Error 2
[INFO]: make[2]: Target 'modules' not remade because of errors.
[INFO]: make[2]: Leaving directory '/usr/src/kernels/5.1.15-200.fc29.x86_64'
[INFO]: make[1]: *** [Makefile:169: sub-make] Error 2
[INFO]: make[1]: Target 'modules' not remade because of errors.
[INFO]: make[1]: Leaving directory '/usr/src/kernels/5.1.15-200.fc29.x86_64'
[INFO]: make: *** [Makefile:81: modules] Error 2
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 418.67 failed, quitting

Robert_Crovella · July 4, 2019, 1:48am

This is a fairly common occurrence. The driver that gets built into the installer represents our best knowledge and tested capability at the time. But things change.

When you have a driver install problem like this with an older driver – regardless of source, the usual advice is to try the latest driver. For example, R430 drivers are available now.

If you find that an R430 (or whatever is the latest) driver can install successfully, then use it. It will work with any current version of CUDA. Furthermore, NVIDIA is highly unlikely, based on historical behavior, to go back and “fix” or update an old driver, or a driver in an “old” CUDA package, to address such things.

If you find that you cannot install the latest driver on the late model kernel you are using, a suggested thing to do then would be to file a bug. (And of course you can post about it.)

tmdag · July 4, 2019, 2:12am

Thanks Robert for reply!

Just wanted to ask as I am little bit confused:

those drivers are a ‘current latest’ drivers that you guys have available under
CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer

How can I obtain newer CUDA driver installation then ?

Is there any reason why old and unsupported drivers are being provided on an official pages? :)

Is just NVIDIA driver (without CUDA) imho. Is there a process of installing Nvidia drivers separately and then CUDA drivers separately that I could follow ? (if that is the case)

Robert_Crovella · July 4, 2019, 2:19am

http://www.nvidia.com/drivers

You may call it whatever you wish. It certainly does not have CUDA included, since it is just drivers. But it will work fine with any version of CUDA, as I’ve stated already. It’s OK if you don’t believe me.
Regardless of what you choose to call it, it is what I was referring to when I referred to newer drivers and R430 drivers.

They are only “unsupported” when used outside of the defined support for CUDA, which is outlined in the CUDA linux install guide:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

Read table 1. There is no kernel 5.x listed as supported anywhere in that table.

Your configuration with that kernel is unsupported by any version of CUDA, and this is plainly documented.

tmdag · July 4, 2019, 2:21am

clear! thanks for explanation!

Sorry, I never said i don’t believe you, I was just confused. All good!

Robert_Crovella · July 4, 2019, 6:40pm

These are condensed instructions to use a separately installed driver.

Find the driver you wish to use from the source indicated above.
install it
Run the CUDA toolkit installer of your choice:

if package manager, replace the instructions to install cuda with install cuda-toolkit or similar
if runfile installer, manually deselect the option (e.g. answer “no”, or similar) to install the driver

It’s always a good idea to be thoroughly familiar with the contents of the linux install guide I linked above.

tmdag · July 4, 2019, 9:17pm

Awesome, thanks for help Robert!

tmdag · July 5, 2019, 7:43am

Updated with 430.26 and all works under kernel 5.1

Had to temporarily change to multi-user target, otherwise i was getting frozen boot screen. (even switching to different F2-12 console did not work)

systemctl set-default multi-user.target

I’ve re-run cuda 10.1 installer with driver installation unchecked. It finished with an error (I couldn’t find anything meaningful inside logs, besides millions of messages

[ERROR]: boost::filesystem::remove: Directory not empty: "/var/log/nvidia/.uninstallManifests/"

Anyway, cuda either got updated or works from previous installation, which is great.

My initial confusion came from (wrongly) thinking that CUDA driver is somehow different driver than the “normal one”. Same idea that we had(have?) quadro OpenGL drivers and Nvidia drivers as a separate thing. But knowing that Cuda is just an additional set of libraries that can work together with ‘default’ NVidia drivers helps and all works just fine.

Cheers!

Topic		Replies	Views
CUDA 10.1 installation error CUDA Setup and Installation	3	2481	February 26, 2020
Fail to install 515 driver when installing cuda 117 on Ubuntu 22.04 Linux kernel , ubuntu	0	373	March 17, 2023
Cuda 10.1 installation problems CUDA Setup and Installation	0	513	March 5, 2021
A40 installs the graphics card driver and reports an error CUDA Setup and Installation	5	1902	August 14, 2023
cuda_10.1.105_418.39_linux.run driver install with custom kernel 4.19 failure. CUDA Setup and Installation	3	1333	October 15, 2019
Cuda toolkit, Ubuntu 22.04 with a nvidia driver that can't be upgraded CUDA Setup and Installation	4	12108	January 12, 2023
Ubuntu CUDA 11.7 runfile installation issue CUDA Setup and Installation	0	484	July 6, 2023
Having trouble installing CUDA 11.2 on Ubuntu20.04 (kernel 5.11.5). Nvidia driver unsupported? CUDA Setup and Installation	4	9156	June 11, 2022
Followed guide NVIDIA CUDA Installation Guide for Linux, failing at driver install CUDA Setup and Installation cuda , ubuntu	1	1643	October 27, 2020
Installing driver for ubuntu 18.04.5 fails with ERROR: Unable to load the kernel module 'nvidia.ko' CUDA Setup and Installation	2	6297	April 6, 2021

installation fails with kernels >= 5.1.x

Related topics