CUDA Repo. Update Issues - NVIDIA-RedHat Linux

In our RedHat Linux server we have installed NVIDIA. During the server updates we have received a number of CUDA Issues. For more information you could read the below logs :

Error:
Problem 1: package nvidia-open-3:560.28.03-1.noarch from cuda-rhel9-x86_64 requires nvidia-kmod >= 3:560.28.03, but none of the providers can be installed

  • cannot install the best update candidate for package cuda-drivers-555.42.06-1.x86_64
  • package kmod-nvidia-latest-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package kmod-nvidia-open-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
    Problem 2: package nvidia-kmod-common-3:560.28.03-1.el9.noarch from cuda-rhel9-x86_64 requires nvidia-kmod = 3:560.28.03, but none of the providers can be installed
  • cannot install the best update candidate for package nvidia-kmod-common-3:555.42.06-1.el9.noarch
  • package kmod-nvidia-latest-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package kmod-nvidia-open-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
    Problem 3: package cuda-12.6.0-1.x86_64 from cuda-rhel9-x86_64 requires nvidia-open >= 560.28.03, but none of the providers can be installed
  • package nvidia-open-3:560.28.03-1.noarch from cuda-rhel9-x86_64 requires nvidia-kmod >= 3:560.28.03, but none of the providers can be installed
  • cannot install the best update candidate for package cuda-12.5.1-1.x86_64
  • package kmod-nvidia-latest-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package kmod-nvidia-open-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
    Problem 4: package nvidia-driver-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-kmod-common = 3:560.28.03, but none of the providers can be installed
  • package nvidia-kmod-common-3:560.28.03-1.el9.noarch from cuda-rhel9-x86_64 requires nvidia-kmod = 3:560.28.03, but none of the providers can be installed
  • cannot install the best update candidate for package nvidia-driver-3:555.42.06-1.el9.x86_64
  • package kmod-nvidia-latest-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package kmod-nvidia-open-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
    Problem 5: package nvidia-driver-cuda-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-kmod-common = 3:560.28.03, but none of the providers can be installed
  • package nvidia-kmod-common-3:560.28.03-1.el9.noarch from cuda-rhel9-x86_64 requires nvidia-kmod = 3:560.28.03, but none of the providers can be installed
  • cannot install the best update candidate for package nvidia-driver-cuda-3:555.42.06-1.el9.x86_64
  • package kmod-nvidia-latest-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package kmod-nvidia-open-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
    Problem 6: package nvidia-driver-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-kmod-common = 3:560.28.03, but none of the providers can be installed
  • package nvidia-settings-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-driver(x86-64) = 3:560.28.03, but none of the providers can be installed
  • package nvidia-kmod-common-3:560.28.03-1.el9.noarch from cuda-rhel9-x86_64 requires nvidia-kmod = 3:560.28.03, but none of the providers can be installed
  • cannot install the best update candidate for package nvidia-settings-3:555.42.06-1.el9.x86_64
  • package kmod-nvidia-latest-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package kmod-nvidia-open-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
    Problem 7: package nvidia-driver-cuda-3:555.42.06-1.el9.x86_64 from @System requires nvidia-driver-cuda-libs(x86-64) = 3:555.42.06, but none of the providers can be installed
  • package cuda-drivers-555.42.06-1.x86_64 from @System requires nvidia-driver-cuda >= 3:555.42.06, but none of the providers can be installed
  • package nvidia-driver-cuda-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-kmod-common = 3:560.28.03, but none of the providers can be installed
  • cannot install both nvidia-driver-cuda-libs-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 and nvidia-driver-cuda-libs-3:555.42.06-1.el9.x86_64 from @System
  • package cuda-runtime-12-1-12.1.1-1.x86_64 from @System requires cuda-drivers >= 530.30.02, but none of the providers can be installed
  • package cuda-drivers-3:560.28.03-1.x86_64 from cuda-rhel9-x86_64 requires nvidia-kmod >= 3:560.28.03, but none of the providers can be installed
  • package nvidia-kmod-common-3:560.28.03-1.el9.noarch from cuda-rhel9-x86_64 requires nvidia-kmod = 3:560.28.03, but none of the providers can be installed
  • cannot install the best update candidate for package nvidia-driver-cuda-libs-3:555.42.06-1.el9.x86_64
  • cannot install the best update candidate for package cuda-runtime-12-1-12.1.1-1.x86_64
  • package cuda-drivers-530.30.02-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-535.104.05-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-535.104.12-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-535.129.03-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-535.154.05-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-535.161.07-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-535.161.08-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-535.183.01-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-535.183.06-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-535.54.03-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-535.86.10-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-545.23.06-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-545.23.08-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-550.54.14-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-550.54.15-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-550.90.07-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-555.42.02-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package cuda-drivers-555.42.06-1.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package kmod-nvidia-latest-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package kmod-nvidia-open-dkms-3:560.28.03-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package nvidia-driver-cuda-3:555.42.06-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  • package nvidia-driver-cuda-libs-3:555.42.06-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  1. I’m sharing with you below CUDA information from in our server:

dnf info cuda

Updating Subscription Management repositories.
Last metadata expiration check: 0:51:26 ago on Wed 21 Aug 2024 09:34:56 AM EDT.
NVIDIA driver: some kernel packages have been filtered due to missing precompiled modules.
Please run “dnf nvidia-plugin” as a command to see a report on the filter being applied.
Installed Packages
Name : cuda
Version : 12.5.1
Release : 1
Architecture : x86_64
Size : 0.0
Source : cuda-12.5.1-1.src.rpm
Repository : @System
From repo : cuda-rhel9-x86_64
Summary : CUDA meta-package
URL : http://nvidia.com
License : NVIDIA Proprietary
Description : Meta-package containing all the available packages required for native CUDA
: development. Contains the toolkit, samples, driver and documentation.

  1. From our troubleshooting We suspect that conflicting packages are in cuda repository only:
    2.1.
    cat sos_commands/dnf/dnf_list_installed | grep cuda| head <-=== Conflicting packages are in cuda repo only
    cuda.x86_64 12.5.1-1 @cuda-rhel9-x86_64
    cuda-12-1.x86_64 12.1.1-1 @cuda-rhel9-x86_64
    cuda-12-2.x86_64 12.2.2-1 @cuda-rhel9-x86_64
    cuda-12-3.x86_64 12.3.2-1 @cuda-rhel9-x86_64

2.2
repo id repo name
cuda-rhel9-x86_64 cuda-rhel9-x86_64 <===== This is cuda repo
epel Extra Packages for Enterprise Linux 8 - x86_64
packages-microsoft-com-prod packages-microsoft-com-prod
rhel-9-for-x86_64-appstream-rpms Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)
rhel-9-for-x86_64-baseos-rpms Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)
disable all of the Cuda repositories, the updates complete without any issue. Enabling the Cuda repositories creates a number of modular dependency issues.

  1. So, If we disable the cuda repo. “cuda-rhel9-x86_64”, the updates complete without any issue. Enabling the cuda repositories creates a number of modular dependency issues.

Please, could you support us here, to fix this issue? We are in production servers and want to prevent this issue on our next servers updates

On the face of it this is similar to: [Issues with cuda-12.6.0-1.x86_64 from RHEL8 repo] from September the 4th and others

It seems to me that it would be helpful if someone from Nvidia could provide the proper way out of this issue.

I dont know enough to say this is ok for everyone but I did:
sudo dnf module install nvidia-driver:latest-dkms
redid the upgrade and despite the remaining complaints it all seemed to work.

to get rid of the error messages on every dnf upgrade, eventually I decided to:

sudo dnf remove cuda
sudo dnf install cuda-toolkit
(this installs everything but the driver)

Thank you @ss_various_email for your suggestion, appreciate it.

I’m wondering if someone from NVIDIA Could provide a feedback for our problem?

I have often noticed that on the [CUDA Programming and Performance] sub-forum people seem to leap (relatively speaking) to answer what to me look like the most specific and obscure looking questions, yet on this sub-forum very few definitive answers ever arrive, particularly from Nvidia themselves. Yet if you cant get cuda to work properly or at all the questions of [CUDA Programming and Performance] are not likely to be a problem! Not intended as a criticism of individuals at all.

Its all very well for someone to suggest remove cuda and get the driver sorted then install cuda-toolkit. However it would be really useful to have Nvidia confirm that this is a good method and you would end up with exactly what you had before, working in the same way with no loss of data or configuration (or within reason what you would need to save or fix afterwards). It would also be useful to have a confirmation that, as far as anyone can reasonably predict, that if you install cuda-toolkit, then aside from from needing to get the driver right separately, dnf update wouldn’t get in this sort of mess. Yes, I have read the installation docs but am still not certain.