CUDA 10 and 415 drivers on Ubuntu

I recently acquired an RTX 2060 so I need to upgrade my drivers from 410 (which came with CUDA 10) to 415 on a machine running Ubuntu 16.04.

Problem is if I try to install the 415 drivers it uninstalls CUDA 10 since the DEB package for CUDA 10 made the 410 drivers a prerequisite.

So if I try to install the 415 driver DEB package, apt-get will uninstall the 410 driver package (which it should correctly do so), and because CUDA 10 lists the 410 driver as a prerequisite apt-get then tries to uninstall CUDA 10 (which I obviously don’t want).

Anyone know of a way around this so I end up with CUDA 10 and 415 drivers?

I already tried the following:

  1. Uninstall CUDA via: apt-get remove cuda
  2. Install 415 drivers via (this will also uninstall the 410 drivers): apt-get install nvidia-415
  3. Install CUDA 10 toolkit (ie. exclude the 410 drivers in the DEB package) via: apt-get install cuda-toolkit-10.0

The above installs fine with no errors and after a reboot properly shows the 415 drivers installed when running nvidia-smi.

I can then compile kernels with the installed CUDA 10, all without any errors. Problem is when I try to run any of the compiled kernels I get an error saying the runtime driver version is older than the current system driver, probably due to the fact that the CUDA 10 DEB file was built with the older 410 drivers but the OS is running the 415 drivers.

So how to get around this issue?

Your 1-2-3 steps are the correct method generally. However I would modify step 2 slightly.

The problem may be that you have installed the GPU driver (nvidia-415) from a PPA archive. I always recommend installing NVIDIA drivers from NVIDIA sources (e.g. nvidia provided deb archives, or nvidia provided runfile installers). The PPA archives are created by a 3rd party and there is no guarantee that they include all necessary driver components for CUDA usage.

In any event, the reason you are running into a problem is that you have a incomplete or broken driver install.

Another possible reason for this is that some residual GPU driver files are installed even after step 1 completes. This doesn’t surprise me since you have evidently tried a number of things on this system. Each thing tried that installed or attempted to install a driver must be properly unwound to the clean state, before a subsequent driver install (in step 2) can be presumed to work correctly. As you’ve already discovered, the simple fact that nvidia-smi runs correctly doesn’t guarantee a proper driver install with respect to CUDA, unfortunately. (it is a necessary condition, but not a sufficient condition, in all cases)

The CUDA linux installation guide gives general instructions under “handle conflicting installs…” but it cannot cover every case of possible things you may have done in the history of this system.

In some cases, simply doing a reboot fixes this, but you’ve probably already done that and I’m not sure it is the cause of what you are seeing.

In the worst case, a fresh load of the OS is guaranteed to clean up the system to a clean state.

I would recommend doing a thorough cleaning of the system, and then repeat your steps 2-3, but use an NVIDIA source for the GPU driver (in step 2).

As a general rule, package names for CUDA or the GPU driver that start with nvidia-* are not created or maintained by NVIDIA.

Thanks for the reply. You’re right, it was most probably all the retries/re-installs which caused some sort of conflict. I finally got it working doing the following, starting from it’s current state after all the retries/re-installs:

  1. Remove CUDA (again): apt-get remove cuda
  2. Get a list of all CUDA packages on the system: dpkg-query -l ‘cuda
  3. For any CUDA packages from #2 which are not marked as ‘un’ or ‘rr’ (not installed or removed), remove them via: dpkg -r %{packagename}
  4. Get a list of all driver packages on the system: dpkg-query -l ‘nvid
  5. Verify that only 410 driver packages are marked as ‘ii’ (installed), remove any other version using “dpkg -r”
  6. Install CUDA 10 toolkit: apt-get install cuda-toolkit-10.0
  7. Install driver 415: apt-get install nvidia-415

The last step now properly shows/executes removing the 410 drivers but does not touch the CUDA 10 packages.

Everything now works just as expected: nvidia-smi shows both driver 415 and CUDA 10 versions, kernels compile and run properly.

Thanks again for the help.