Hi. I’ve been using a GTX 970 and a Tesla K80 on my current system (not with both cards installed simultaneously). Yesterday I bought a new RTX 2080 Super and installed it (with no other GPU cards). When I boot, the system never fully comes up. I’ve tried several linux kernels (4.15.0-106, -101, -76, and 4.4.0-96).
@generix or @aplattner, maybe one of you have advice? I booted in recovery mode and ran a bug report, which I’ll attach. Here is a snippet of the output that looks to highlight a problem:
Scanning kernel log files for NVIDIA kernel messages:
Jun 22 19:28:32 synapse kernel: [ 2.996371] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
Jun 22 19:28:32 synapse kernel: [ 3.045432] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 418.87.00 Thu Aug 8 15:35:46 CDT 2019
Jun 22 19:28:32 synapse kernel: [ 3.051279] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 418.87.00 Thu Aug 8 15:27:13 CDT 2019
Jun 22 19:28:32 synapse kernel: [ 3.052109] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Jun 22 19:28:32 synapse kernel: [ 3.052110] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
Jun 22 19:28:32 synapse kernel: [ 3.230229] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 235
Jun 22 19:28:32 synapse kernel: [ 3.893568] NVRM: GPU at PCI:0000:01:00: GPU-c7bb995b-4d05-b0c7-ac5f-a32686291914
Jun 22 19:28:32 synapse kernel: [ 3.893569] NVRM: GPU Board Serial Number:
Jun 22 19:28:32 synapse kernel: [ 3.893570] NVRM: Xid (PCI:0000:01:00): 61, 0c06(3034) 00000000 00000000
Jun 22 19:28:32 synapse kernel: [ 7.882734] NVRM: RmInitAdapter failed! (0x26:0x65:1106)
Jun 22 19:28:32 synapse kernel: [ 7.882805] NVRM: rm_init_adapter failed for device bearing minor number 0
Jun 22 19:28:41 synapse kernel: [ 16.754492] NVRM: RmInitAdapter failed! (0x26:0x65:1106)
Jun 22 19:28:41 synapse kernel: [ 16.754509] NVRM: rm_init_adapter failed for device bearing minor number 0
Jun 22 19:28:49 synapse kernel: [ 24.822561] NVRM: RmInitAdapter failed! (0x26:0x65:1106)
Jun 22 19:28:49 synapse kernel: [ 24.822627] NVRM: rm_init_adapter failed for device bearing minor number 0
I’m using Ubuntu 18.04.4, linux kernel 4.15.0-106, and NVIDIA driver 418.87.00.
nvidia-bug-report.log (1.8 MB)
The rminit messages are rather hardware related.
First measure, please reseat the card in its slot, maybe it’s not fully/correctly connected.
If still not working, please check if it works on another mainboard/system, otherwise the gpu is broken.
Thanks! I tried the card on a different machine (running Windows 10) and it worked as expected.
Then, back to my Ubuntu machine. I had been using the mainboard HDMI output for video. I tried instead using the GPU card for video. In that case I see some more of the startup messages. It stalled initially with message: “Wait until snapd is fully seeded.”
Then after 5-10 minutes there were more messages and it stalled with: “BIOS contain WGDS but not WRDS” which seems to be something related to wifi.
Are you sure 418.87 is the correct driver? When I look at the Nvidia drivers page, the oldest 20XX driver I can see is 430.34
"Added support for the following GPUs:
GeForce RTX 2060 SUPER
GeForce RTX 2070 SUPER
GeForce RTX 2080 SUPER"
You might want to check this link, to get the correct driver to suit the Cuda toolkit you’re using.
Thanks, @rs277. Yes, was wondering about that. I have CUDA 10.1 and thus 418.87 should be sufficient (barely) in general, but I don’t know if that’s sufficient for the RTX 2080. I’m a little unclear as to how best to upgrade the driver to 430/440 given that I can’t boot the machine with the RTX 2080 installed, except to recovery mode.
I’m unfamiliar with Ubuntu, but the drivers can be installed from the command line, so I’d have thought that could be done in recovery mode.
To get the repo driver + cuda 10.1, remove any cuda and nvidia packages
sudo apt remove *nvidia* cuda*
Afterwards, install the repo driver
sudo apt install nvidia-driver-440
Then download the cuda package, follow the install instructions on the download page but don’t do the last step, installing ‘cuda’ since this would overwrite the repo driver. Instead, run
sudo apt install cuda-toolkit-10-1
to install just the toolkit.
@generix, I followed your advice, but on the last step. I get this:
mroos@synapse:Downloads$ sudo apt install cuda-toolkit-10-1
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package cuda-toolkit-10-1 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
E: Package 'cuda-toolkit-10-1' has no installation candidate
Where can I find the package?
I did a more thorough removal of all things nvidia and cuda, via this sequence of commands:
Then repeated @generix install suggestion above. Everything is working now.