I try from 5 days to add the nvidia drivers for my graphic cards (to install cuda)
Kernels tests : 4.15 4.18 4.20(homemade)
drivers tests with ppa : 410 415 418 430
drivers test on nvidia website : nvidia-linux-x86_64-418.56.run nvidia-linux-x86_64-430.09.run cuda_10.1.105_418.39_linux.run
OS tested : 16.4_LST 18.4_LST
with and without “quiet” “splash” “nomodeset” on the boot options
with blacklist “nouveau” and completly remove “nouveau” from apt
with gcc 7.3 and it upgrade to 7.4
with glibc 2.27
And I have a 1000W power
The driver installation work fine but when I reboot there is an horizontal multi-color snow on my screen and I enter on a dead-loop loading when it try to launch the desktop. the dead loop try to do something with the gpu and every-time it touch the gpu drivers the computer freeze and the multi-color snow appear for 30 seconds disappear 3 seconds and come back.
after few minutes I can switch with ctrl-alt-fx when there is no snow but after 2 seconds it send me back to the launch desktop so I have no time to do anything.
I can navigate on the machine and remove the drivers by using grup recovery mode to come back to an usable mode and use again “nouveau” driver but can’t use the GPUs.
when I’m on recovery mode “nvidia-smi” don’t find any graphic card.
I try to add a small graphic card (a quadro P600) witch is find by “nvidia-smi” so the driver installation is ok for this card (driver version 418).
I try with only one graphic card and the only change is the color of the snow (becoming purple/pink).
I try almost everything a can but nothing work for me.
Does somebody had the same problem a find a way to resolve it?
nvidia-bug-report.log.gz (585 KB)
This rather sounds like a hw failure of the gpu or some incompatibility with the mainboard.
Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
To have errors logged, install the driver from ppa (it will include the nvidia-bug-report.sh script), let it crash, then either try reaching a VT (ctrl+alt+f1), use ssh or boot to recovery and run the script.
Thank you for the answer,
I add attached the nvidia-bug-report to the first message
Both 2080ti fail to initialize:
May 13 12:11:34 asr-MS-7A93 kernel: [ 275.631161] NVRM: rm_init_adapter failed for device bearing minor number 0
May 13 12:11:43 asr-MS-7A93 kernel: [ 284.111165] NVRM: RmInitAdapter failed! (0x26:0x65:1127)
May 13 12:11:43 asr-MS-7A93 kernel: [ 284.111190] NVRM: rm_init_adapter failed for device bearing minor number 1
May 13 12:11:51 asr-MS-7A93 kernel: [ 292.598253] NVRM: RmInitAdapter failed! (0x26:0x65:1127)
May 13 12:11:51 asr-MS-7A93 kernel: [ 292.598281] NVRM: rm_init_adapter failed for device bearing minor number 1
You should check them one by one to see if any of them is working when alone in the system and RMA them otherwise.
I already check them one by one (but not with lanching nvidia bug report) and they had the same comportment, so if the 2 cards have the same error I am very unlucky.
I will check and do a bug report tomorrow card by card but i would like to tests every other options before RAM them.
If you already checked them one-by-one, you should rather check them in another system, very unlikely something new would come up by testing them in the same mainboard.