I was followign this thread and everything seemed working
Hello all,
I’ll begin by stating that I don’t know what the “actual” problem is and I’m not very experienced in Linux, so I’ll try to provide as much information as possible but will probably miss something.
I have a machine I’m setting up for machine learning. It’s running a AMD 1600x and RTX a4000.
The issue: When the computer boots into BIOS, then allows me to select the OS, then Linux starts the Runlevel programs, and when it gets to the “NVIDIA persistence daemon”, that’s when the screen…
However, at the DGX GUI users login, I am stuck at a loop that every time I put in my credencials, it just comeback to the list of users logins.
My grub looks like this after adding “nvidia-drm.modeset=1”
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=lsb_release -i -s 2> /dev/null || echo Debian
GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash”
GRUB_CMDLINE_LINUX=“nvidia-drm.modeset=1”
My Goal is to get able to use the GUI again. It was working before.
But failed at start NVIDIA Persisance after I upgraded the cuda to 12.1.1 (which was sucessfully achieved)
In my case I am able to ssh to this DGX and i can provide dignaosis info upon request.
Thank you very much!
Update:
Actaully with the GRUB update above,
nvidia-smi gives this error:
Failed to initialize NVML: Driver/library version mismatch
I have to to do the following to get the nvidia-smi back to 12.1.1, but first line of the command will kill the GUI to back to the “Failed to start NVIDIA Persistence Daemon” status
sudo service gdm3 stop
sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia
lsmod | grep nvidia
nvidia-smi
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.
generix:
nvidia-bug-report.sh
nvidia-bug-report.log.gz (2.6 MB)
Thanks a lot!
Please post the output of dkms status
Please run
sudo dkms remove nvidia/470.42.01 --all
sudo dkms install nvidia/530.30.02
sudo update-initramfs -u
After reboot, please create a new nvidia-bug-report.log and post the output of dkms status.
Thanks so much, please hlep to check the bug report and status
dkms status output:
nvidia, 530.30.02, 4.15.0-212-generic, x86_64: installed
nvidia-bug-report.log.gz (2.9 MB)
Seems to be working now. To be on the safe side, you should check if there are some 470 driver packages left
dpkg -l |grep nvidia
Yes it is working now, both the GUI and the cuda.
Thank you so much!!