Problem installing 470.94 on Ubuntu 18.04 for GeForce RTX 3080TI

gmarcais · January 25, 2022, 3:43pm

The compilation of the kernel module is successful, but when loading I get the error:

[53764.480742] VFIO - User Level meta-driver version: 0.3
[53764.535519] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[53764.536991] nvidia 0000:83:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=none
[53764.537038] NVRM: The NVIDIA GPU 0000:83:00.0 (PCI ID: 10de:2208)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 470.94 driver release.
NVRM: Please see ‘Appendix A - Supported NVIDIA GPU Products’
NVRM: in this release’s README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.
[53764.537140] nvidia: probe of 0000:83:00.0 failed with error -1
[53764.537159] NVRM: The NVIDIA probe routine failed for 1 device(s).
[53764.537159] NVRM: None of the NVIDIA devices were initialized.
[53764.537298] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235

Nouveau is blacklisted and not loaded. 470 should support this card.

Thank you for your help.

generix · January 25, 2022, 3:58pm

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

gmarcais · January 25, 2022, 4:10pm

nvidia-bug-report.log.gz (68.1 KB)

Here it is.

generix · January 25, 2022, 4:41pm

I’d say the message is misleading. The logs show that the gpu was working fine for quite some time until the driver reported the gpu failing while working. Then you tried up/downgrading the driver and now it doesn’t even recognize the type of gpu anmore. I guess it’s simply broken. Please remove it and check if it works in another system.

gmarcais · January 25, 2022, 4:47pm

Here is the sequence of events. We has an older GPU in this machine (a Tesla series I believe). That card was working. I removed that card Friday evening and tried to install the driver last night and this morning. The card in the machine right now is brand new.

I should mention that this is a shared server machine where X is not going to be used. The intent is to use cuda via pytorch.

generix · January 25, 2022, 5:00pm

The logs are from January 24th, yesterday. you were installing driver 440 which is too old. Then you installed 470.94, worked. Then you installed 510.39, worked. Then you installed 465.19.01, worked. For some time, then it broke. Then you rebooted, still broken. Then you wildly installed all kinds of driver versions, none of which ever worked again. Maybe you just confused the gpu by installing/loading/unloading/loading/unloading/installing other driver/loading/unloading…so all it needs is a power off.
Or it’s just broken. Please check if it works in another system.

gmarcais · January 25, 2022, 5:28pm

Thank you for your help. So a reboot and a clean run of NVIDIA-Linux-x86_64-470.94.run was successful.

Now the goal is to install cuda_11.3 as this is the version supported by pytorch. What was recommend by NVIDIA’s website is cuda_11.3.0_465.19.01_linux.run. Is that the source of the 465 driver that confused me and the GPU? Should I not install the 470 driver at all?

gmarcais · January 25, 2022, 6:45pm

I understand that cuda_11.3.0 also tries to install a kernel driver. So I removed properly 470, installed cuda_11.3.0 with 465. The driver now loads properly. But pytorch still fails to see the GPU, and dmesg reports some errors: “RmInitAdapter failed!”, even after a reboot.

nvidia-bug-report.log.gz (300.6 KB)

generix · January 26, 2022, 9:24am

It’s also falling off the bus. I repeat:
I guess it’s simply broken. Please remove it and check if it works in another system.

Topic		Replies	Views
Cant manage install RTX 3070 drivers and CUDA Linux	4	1639	February 22, 2022
Driver version 555.58.02 failed to probe with kernel 6.10.3-200.fc40.x86_64 Linux	8	481	October 31, 2024
Nvidia Driver installation failure Drivers - Linux, Windows, MacOS cuda , kernel , nvidia-smi	1	213	July 2, 2024
Understanding open driver error load with V100 GPU (Ubuntu 22.04) Linux	2	516	November 5, 2024
NVIDIA Linux Drivers 470.74 Linux	15	3668	October 5, 2021
Problem on installing RTX3070 drivers ubuntu dual boot Linux	0	799	February 20, 2022
Cuda 10.2.89-440.33.01 Ubuntu 18.04 Linux	0	1000	January 31, 2020
Can't install driver 361.62 on CentOS 7.2 CUDA Programming and Performance	4	3004	May 28, 2016
Compatibility issue about CUDA driver update 440->450 Deep Learning (Training & Inference)	3	1597	October 12, 2021
NVIDIA Driver Installation Failure on Ubuntu22.04 VM with GTX 1080 Ti Linux	1	388	November 5, 2024

Problem installing 470.94 on Ubuntu 18.04 for GeForce RTX 3080TI

Related topics