Driver installation is failing

I am getting following on RHEL9 when I follow the instruction on CUDA Toolkit 12.6 Update 3 Downloads | NVIDIA Developer
LINUX->x86_64->rhel->9
Toolkit is installed fine.
But when installing driver:

sudo dnf -y module install nvidia-driver:latest-dkms

It fauls as follows:
Error:
Problem 1: conflicting requests

  • nothing provides dkms needed by kmod-nvidia-latest-dkms-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64
    Problem 2: package nvidia-kmod-common-3:565.57.01-1.el9.noarch from cuda-rhel9-x86_64 requires nvidia-kmod = 3:565.57.01, but none of the providers can be installed
  • conflicting requests
  • package kmod-nvidia-565.57.01-5.14.0-427.42.1-3:565.57.01-3.el9_4.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package kmod-nvidia-565.57.01-5.14.0-503.14.1-3:565.57.01-3.el9_5.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • nothing provides dkms needed by kmod-nvidia-latest-dkms-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64
  • package kmod-nvidia-open-dkms-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • nothing provides dkms needed by kmod-nvidia-open-dkms-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64
    Problem 3: package nvidia-driver-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-kmod-common = 3:565.57.01, but none of the providers can be installed
  • package nvidia-kmod-common-3:565.57.01-1.el9.noarch from cuda-rhel9-x86_64 requires nvidia-kmod = 3:565.57.01, but none of the providers can be installed
  • conflicting requests
  • package kmod-nvidia-565.57.01-5.14.0-427.42.1-3:565.57.01-3.el9_4.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package kmod-nvidia-565.57.01-5.14.0-503.14.1-3:565.57.01-3.el9_5.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • nothing provides dkms needed by kmod-nvidia-latest-dkms-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64
  • package kmod-nvidia-open-dkms-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • nothing provides dkms needed by kmod-nvidia-open-dkms-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64
    Problem 4: package nvidia-driver-cuda-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-kmod-common = 3:565.57.01, but none of the providers can be installed
  • package nvidia-kmod-common-3:565.57.01-1.el9.noarch from cuda-rhel9-x86_64 requires nvidia-kmod = 3:565.57.01, but none of the providers can be installed
  • conflicting requests
  • package kmod-nvidia-565.57.01-5.14.0-427.42.1-3:565.57.01-3.el9_4.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package kmod-nvidia-565.57.01-5.14.0-503.14.1-3:565.57.01-3.el9_5.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • nothing provides dkms needed by kmod-nvidia-latest-dkms-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64
  • package kmod-nvidia-open-dkms-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • nothing provides dkms needed by kmod-nvidia-open-dkms-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64
    Problem 5: package nvidia-driver-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-kmod-common = 3:565.57.01, but none of the providers can be installed
  • package nvidia-settings-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-driver(x86-64) = 3:565.57.01, but none of the providers can be installed
  • package nvidia-kmod-common-3:565.57.01-1.el9.noarch from cuda-rhel9-x86_64 requires nvidia-kmod = 3:565.57.01, but none of the providers can be installed
  • conflicting requests
  • package kmod-nvidia-565.57.01-5.14.0-427.42.1-3:565.57.01-3.el9_4.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package kmod-nvidia-565.57.01-5.14.0-503.14.1-3:565.57.01-3.el9_5.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • nothing provides dkms needed by kmod-nvidia-latest-dkms-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64
  • package kmod-nvidia-open-dkms-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • nothing provides dkms needed by kmod-nvidia-open-dkms-3:565.57.01-1.el9.x86_64 from cuda-rhel9-x86_64
    (try to add ‘–skip-broken’ to skip uninstallable packages or ‘–nobest’ to use not only best candidate packages)

Forgive me if this isnt helpful as I don’t use RHEL and you didn’t say what card you have!

may be of more help as it looks as if you may not have dkms installed. try:
$dnf list dkms
you should be seeing:
Installed Packages
dkms.noarch 3.1.1-1.fc40 @updates

or similar. if not, then section 5.1 might help.

The card you are using determines whether you should go proprietary or open. The guide I pointed you to seems a mix of general and datacentre methods.

Personally, I could only get my installation (admittedly on Fedora) to work by using::

sudo dnf install nvidia-driver:latest-dkms
(proprietary - my card is too old for open, and leave “-y modules” out of it)

sudo dnf install cuda-toolkit
(this seems to install everything but leave the driver alone)

Hope this helps

I feel it would be really helpful if someone from Nvidia who actually understands this stuff could get involved in these setup issues.

ok, i figured too that i have to install dkms but for that, in RHEL, also need to install epel repository. Now driver installs but “sudo dkm status” shows “added” not “installed”
Deliberately built and installed driver using dkms:
sudo dkms build -m <module/version>
sudo dkms install -m <module/version>
They went ok but once it is done, when I tried loading (sudo modprobe nvidia), it says /lib/module- does not have driver!
Makes no sense because I built and installed for this kernel version.

Below are some notes that I made after not noticing the open driver problem (you would need to substitute the driver version for the one you are getting:

oh dear open drivers not compatible with pascal. so back to the propietary driver

#do not reboot until after these 3 commands

sudo dnf module reset nvidia-driver

sudo dnf module install nvidia-driver:latest-dkms --allowerasing

result:

Installed:
kmod-nvidia-latest-dkms-3:560.28.03-1.fc39.x86_64
Removed:
kmod-nvidia-open-dkms-3:560.28.03-1.fc39.x86_64

Complete!

but:

dkms status
nvidia/560.28.03, 6.9.11-100.fc39.x86_64, x86_64: installed (original_module exists) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)

sudo dkms remove -m nvidia -v 560.28.03 --all

dkms status is now blank.

sudo dkms install -m nvidia -v 560.28.03

Sign command: /lib/modules/6.9.11-100.fc39.x86_64/build/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub
Creating symlink /var/lib/dkms/nvidia/560.28.03/source → /usr/src/nvidia-560.28.03

Building module:
Cleaning build area…
Building module(s)…
Signing module /var/lib/dkms/nvidia/560.28.03/build/nvidia.ko
Signing module /var/lib/dkms/nvidia/560.28.03/build/nvidia-modeset.ko
Signing module /var/lib/dkms/nvidia/560.28.03/build/nvidia-drm.ko
Signing module /var/lib/dkms/nvidia/560.28.03/build/nvidia-uvm.ko
Signing module /var/lib/dkms/nvidia/560.28.03/build/nvidia-peermem.ko
Cleaning build area…

nvidia.ko.xz:
Running module version sanity check.

  • Original module
    • Found /lib/modules/6.9.11-100.fc39.x86_64/extra/nvidia.ko.xz
    • Storing in /var/lib/dkms/nvidia/original_module/6.9.11-100.fc39.x86_64/x86_64/
    • Archiving for uninstallation purposes
  • Installation
    • Installing to /lib/modules/6.9.11-100.fc39.x86_64/extra/

dkms status
nvidia/560.28.03, 6.9.11-100.fc39.x86_64, x86_64: installed (original_module exists)

reboot ok and:

nvidia-smi
Thu Aug 1 23:40:55 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.28.03 Driver Version: 560.28.03 CUDA Version: 12.6 |
|-----------------------------------------±-----------------------±-----------------

I regret that I can’t be certain this will help you, but nobody who could seems to want to.
Best of luck.

i have similar logs, i have rtx2070.
Actually one time isntallation worked and was able to load and run nvidia-smi. It seems some sort of indeterminate problem, I will check my script.

one more time resulted in another different result, this time bild failed, i am not sure what is going on, never seen this with cuda drvier:

In file included from ./include/linux/scatterlist.h:8,
                 from ./include/linux/dmapool.h:14,
                 from ./include/linux/pci.h:1700,
                 from /var/lib/dkms/nvidia/535.54.03/build/nvidia/nv-pci-table.h:27,
                 from /var/lib/dkms/nvidia/535.54.03/build/nvidia/nv-pci.c:24:
./include/linux/mm.h:2452:59: note: expected ‘struct page **’ but argument is of type ‘long unsigned int’
 2452 |                     unsigned int gup_flags, struct page **pages);
      |                                             ~~~~~~~~~~~~~~^~~~~
In file included from /var/lib/dkms/nvidia/535.54.03/build/common/inc/nv-linux.h:34,
                 from /var/lib/dkms/nvidia/535.54.03/build/common/inc/nv-pci.h:28,
                 from /var/lib/dkms/nvidia/535.54.03/build/nvidia/nv-pci.c:26:
/var/lib/dkms/nvidia/535.54.03/build/common/inc/nv-mm.h:182:20: error: too many arguments to function ‘get_user_pages’
  182 |             return get_user_pages(NULL, mm, start, nr_pages, write, force, pages, vmas);
      |                    ^~~~~~~~~~~~~~
In file included from ./include/linux/scatterlist.h:8,
                 from ./include/linux/dmapool.h:14,
                 from ./include/linux/pci.h:1700,
                 from /var/lib/dkms/nvidia/535.54.03/build/nvidia/nv-pci-table.h:27,
                 from /var/lib/dkms/nvidia/535.54.03/build/nvidia/nv-pci.c:24:
./include/linux/mm.h:2451:6: note: declared here
 2451 | long get_user_pages(unsigned long start, unsigned long nr_pages,
      |      ^~~~~~~~~~~~~~
make[2]: *** [scripts/Makefile.build:249: /var/lib/dkms/nvidia/535.54.03/build/nvidia/nv-pci.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [scripts/Makefile.build:249: /var/lib/dkms/nvidia/535.54.03/build/nvidia/nv.o] Error 1
make[1]: *** [Makefile:1942: /var/lib/dkms/nvidia/535.54.03/build] Error 2
make[1]: Leaving directory '/usr/src/kernels/5.14.0-533.el9.x86_64'
make: *** [Makefile:82: modules] Error 2
[nonroot@localhost cuda]$ sudo dkms status
nvidia/535.54.03: added
[nonroot@localhost cuda]$ nano -w setup-cuda.sh 
[nonroot@localhost cuda]$ uname -r
5.14.0-533.el9.x86_64

As the 2070 is a reasonably up to date card (its newer than my 1050ti) I am surprised to see 535.54.03, your kernel is reasonably up to date I presume.

Is your kernel developer setup matching your kernel?

Your original post looks as if everything was pretty reasonable as latest dkms was trying to do 565…57 proprietary which looks up to date and reasonable.

When I did:

sudo dnf module install nvidia-driver:latest-dkms --allowerasing

I get 560.35.05 and that is working, Is your repo ok?

What happens if you do this:

sudo dkms remove -m nvidia -v 535.54.03 --all

sudo dnf module reset nvidia-driver

sudo dnf module install nvidia-driver:latest-dkms --allowerasing

dkms status

finally tried 12.6 instead of 12.2 and it appears this combination seems working but i still would not count on it:
BEcause they i got it working is weird, it would not build for kernel build No 533 so I was building against 407 build while it booted and built is completed. After rebooted, teh linux machine somehow booted to 533 kernel and somehow dkms may have rebuilt and also worked. Not sure. because when I was deliberately building against533, it was breaking.

vidia-smi ; uname -r ; sudo dkms status
Thu Nov 28 08:56:27 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2070        Off |   00000000:01:00.0  On |                  N/A |
| 41%   52C    P8             10W /  185W |     134MiB /   8192MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 2070 ...    Off |   00000000:02:00.0 Off |                  N/A |
| 41%   37C    P8              5W /  215W |       5MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      3636      G   /usr/bin/gnome-shell                          122MiB |
|    1   N/A  N/A      3636      G   /usr/bin/gnome-shell                            2MiB |
+-----------------------------------------------------------------------------------------+
5.14.0-533.el9.x86_64
nvidia/565.57.01, 5.14.0-407.el9.x86_64, x86_64: installed
[guyen@localhost cuda]$ sudo yum list installed | egrep "nvidia-driver|cuda-toolkit"
cuda-toolkit-12-6.x86_64                         12.6.3-1                         @cuda-rhel9-x86_64    
cuda-toolkit-12-6-config-common.noarch           12.6.77-1                        @cuda-rhel9-x86_64    
cuda-toolkit-12-config-common.noarch             12.6.77-1                        @cuda-rhel9-x86_64    
cuda-toolkit-config-common.noarch                12.6.77-1                        @cuda-rhel9-x86_64    
nvidia-driver.x86_64                             3:565.57.01-1.el9                @cuda-rhel9-x86_64    
nvidia-driver-cuda.x86_64                        3:565.57.01-1.el9                @cuda-rhel9-x86_64    
nvidia-driver-cuda-libs.x86_64                   3:565.57.01-1.el9                @cuda-rhel9-x86_64    
nvidia-driver-libs.x86_64                        3:565.57.01-1.el9                @cuda-rhel9-x86_64