Runfile Installer error for Cuda 10.1 on Ubuntu 18.04

I carefully went through all the prerequisites on this Ubuntu 18.04 install with a GTX1050 and a 7700k CPU. No issues following the pre-installation steps to the letter.
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#pre-installation-actions

Then following the instructions in section 4.1.5.2 for the Runfile Installer, I’ve run into trouble.
https://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html#ubuntu-x86_64-run

The runfile terminates with error, and I inspected the attached /var/log/cuda-installer.log for issues. There seem to be several:

  1. The first is the warning below about the Nouveau driver. I don’t understand why this is here as an “lsmod” shows that it is not loaded.
[INFO]: WARNING: One or more modprobe configuration files to disable Nouveau are already present at: /etc/modprobe.d/nvidia-installer-disable-nouveau.conf.  Please be sure you have rebooted your system since  these files were written.  If you have rebooted, then Nouveau may be enabled for other reasons, such as being included in the system initial ramdisk or in your X configuration file.  Please consult the NVIDIA driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.
  1. Then this error.
[INFO]: ERROR: Unable to load the kernel module 'nvidia-modeset.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA GPU(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.
  1. Which is enough to cause…
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 418.67 failed, quitting

I’m not sure if this is informative, but here is a snippet of dmesg from one of the level-3 boots while attempting to use the Runfile Installer.

[   54.760499] nvidia-uvm: Unloaded the UVM driver in 8 mode
[   54.784146] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[   54.809272] nvidia-modeset: Unloading
[   54.841349] nvidia-nvlink: Unregistered the Nvlink Core, major device number 236
[   69.063004] VFIO - User Level meta-driver version: 0.3
[   69.086094] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[   69.086362] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=io+mem
[   69.185755] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  418.67  Sat Apr  6 03:07:24 CDT 2019
[   69.187775] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  418.67  Sat Apr  6 02:43:09 CDT 2019
[   69.189837] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 510
[   69.189906] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[   69.189907] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0

I went through this a couple of times. Then finally removed the Nouveau blacklisting and switched back to my Cuda-less configuration without issue. This feels like some kind of error on my behalf, but its not obvious to me. I’m not sure if this matters, but both in GUI and level-3 mode, I have my monitor plugged the GTX1050 GPU, not the motherboard’s output from the 7700k. It seems like this method should work. I’m not adverse to using the Debian Installer method, if that is more likely to succeed. I’d rather understand what is going on here. Thanks in advance for any suggestions.

cuda-installer.log (20.8 KB)

1 Like

It actually should be the easiest route to use the runfile.
Can you download the 10.0.130 runfile and the driver 410.48?
Then install the driver, reboot, confirm you have it loaded, and then install the toolkit without the driver when asked.

If you are not installing for system-wide usage, try installing the toolkit just for you in your home area (with your user and not root).
See how it goes.

It actually should be the easiest route to use the runfile.
Can you download the 10.0.130 runfile and the driver 410.48?
Then install the driver, reboot, confirm you have it loaded, and then install the toolkit without the driver when asked.

If you are not installing for system-wide usage, try installing the toolkit just for you in your home area (with your user and not root).
See how it goes.

Thanks @saulocpp. I may give that a try. Before I do, I found this nugget of advice in the Nvidia guide. Note carefully the last sentence.

The CUDA Toolkit can be installed using either of two different installation mechanisms: distribution-specific packages (RPM and Deb packages), or a distribution-independent package (runfile packages). The distribution-independent package has the advantage of working across a wider set of Linux distributions, but does not update the distribution’s native package management system. The distribution-specific packages interface with the distribution’s native package management system. It is recommended to use the distribution-specific packages, where possible.

I take “use the distribution-specific packages” to mean the Package Manager based install. I’m not even sure why I didn’t try that first. No fussing with blacklisting Nouveau or level-3 boot modes. Sounds too good to be true. I downloaded “cuda-repo-ubuntu1804-10-1-local-10.1.168-418.67_1.0-1_amd64.deb” and will give the Package Manager steps 3.6 at the link below a try and follow-up.
[url]https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#ubuntu-installation[/url]

Thanks. -Shep

The Package Manager install worked flawlessly. I should have used it first, as the doc suggested. All set here. Thanks. -Shep

1 Like