Hello all,
I’ll begin by stating that I don’t know what the “actual” problem is and I’m not very experienced in Linux, so I’ll try to provide as much information as possible but will probably miss something.
I have a machine I’m setting up for machine learning. It’s running a AMD 1600x and RTX a4000.
The issue: When the computer boots into BIOS, then allows me to select the OS, then Linux starts the Runlevel programs, and when it gets to the “NVIDIA persistence daemon”, that’s when the screen “freezes”. Secure boot is not enabled. But the computer still works just fine when SSH’d in. It’s an older AMD chip so there is no integrated graphics.
Nouveau should be disabled after being blacklisted by following Install Nvidia Drivers on Debian/Ubuntu | Kinetica Docs and is confirmed by running the command
sudo lsmod | grep nouveau
which returns blank
The drivers are installed correctly as best I can tell. My APT sources look like this:
deb http://deb.debian.org/debian/ bookworm main contrib non-free non-free-firmware
deb-src http://deb.debian.org/debian/ bookworm main contrib non-free non-free-firmware
deb http://security.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware
deb-src http://security.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware
# bookworm-updates, to get updates before a point release is made;
# see https://www.debian.org/doc/manuals/debian-reference/ch02.en.html#_updates_and_backports
deb http://deb.debian.org/debian/ bookworm-updates main contrib non-free non-free-firmware
deb-src http://deb.debian.org/debian/ bookworm-updates main contrib non-free non-free-firmware
# This system was installed using small removable media
# (e.g. netinst, live or single CD). The matching "deb cdrom"
# entries were disabled at the end of the installation process.
# For information about how to configure apt package sources,
# see the sources.list(5) manual.
And the system recognizes the hardware by running
lspci | grep -i “nvidia”
1c:00.0 VGA compatible controller: NVIDIA Corporation GA104GL [RTX A4000] (rev a1)
1c:00.1 Audio device: NVIDIA Corporation GA104 High Definition Audio Controller (rev a1)
Packages found with
dpkg -l | grep -i nvidia
ii firmware-nvidia-gsp 525.125.06-1~deb12u1 amd64 NVIDIA GSP firmware
ii glx-alternative-nvidia 1.2.2 amd64 allows the selection of NVIDIA as GLX provider
ii libcuda1:amd64 525.125.06-1~deb12u1 amd64 NVIDIA CUDA Driver Library
ii libegl-nvidia0:amd64 525.125.06-1~deb12u1 amd64 NVIDIA binary EGL library
ii libgl1-nvidia-glvnd-glx:amd64 525.125.06-1~deb12u1 amd64 NVIDIA binary OpenGL/GLX library (GLVND variant)
ii libgles-nvidia1:amd64 525.125.06-1~deb12u1 amd64 NVIDIA binary OpenGL|ES 1.x library
ii libgles-nvidia2:amd64 525.125.06-1~deb12u1 amd64 NVIDIA binary OpenGL|ES 2.x library
ii libglx-nvidia0:amd64 525.125.06-1~deb12u1 amd64 NVIDIA binary GLX library
ii libnvcuvid1:amd64 525.125.06-1~deb12u1 amd64 NVIDIA CUDA Video Decoder runtime library
ii libnvidia-allocator1:amd64 525.125.06-1~deb12u1 amd64 NVIDIA allocator runtime library
ii libnvidia-cfg1:amd64 525.125.06-1~deb12u1 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-egl-gbm1:amd64 1.1.0-2 amd64 GBM EGL external platform library for NVIDIA
ii libnvidia-egl-wayland1:amd64 1:1.1.10-1 amd64 Wayland EGL External Platform library -- shared library
ii libnvidia-eglcore:amd64 525.125.06-1~deb12u1 amd64 NVIDIA binary EGL core libraries
ii libnvidia-encode1:amd64 525.125.06-1~deb12u1 amd64 NVENC Video Encoding runtime library
ii libnvidia-glcore:amd64 525.125.06-1~deb12u1 amd64 NVIDIA binary OpenGL/GLX core libraries
ii libnvidia-glvkspirv:amd64 525.125.06-1~deb12u1 amd64 NVIDIA binary Vulkan Spir-V compiler library
ii libnvidia-ml1:amd64 525.125.06-1~deb12u1 amd64 NVIDIA Management Library (NVML) runtime library
ii libnvidia-ptxjitcompiler1:amd64 525.125.06-1~deb12u1 amd64 NVIDIA PTX JIT Compiler library
ii libnvidia-rtcore:amd64 525.125.06-1~deb12u1 amd64 NVIDIA binary Vulkan ray tracing (rtcore) library ii nvidia-alternative 525.125.06-1~deb12u1 amd64 allows the selection of NVIDIA as GLX provider
ii nvidia-driver 525.125.06-1~deb12u1 amd64 NVIDIA metapackage ii nvidia-driver-bin 525.125.06-1~deb12u1 amd64 NVIDIA driver support binaries
ii nvidia-driver-libs:amd64 525.125.06-1~deb12u1 amd64 NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
ii nvidia-egl-common 525.125.06-1~deb12u1 amd64 NVIDIA binary EGL driver - common files
ii nvidia-egl-icd:amd64 525.125.06-1~deb12u1 amd64 NVIDIA EGL installable client driver (ICD)
ii nvidia-installer-cleanup 20220217+3~deb12u1 amd64 cleanup after driver installation with the nvidia-installer
ii nvidia-kernel-common 20220217+3~deb12u1 amd64 NVIDIA binary kernel module support files
ii nvidia-kernel-dkms 525.125.06-1~deb12u1 amd64 NVIDIA binary kernel module DKMS source
ii nvidia-kernel-support 525.125.06-1~deb12u1 amd64 NVIDIA binary kernel module support files
ii nvidia-legacy-check 525.125.06-1~deb12u1 amd64 check for NVIDIA GPUs requiring a legacy driver
ii nvidia-modprobe 535.54.03-1~deb12u1 amd64 utility to load NVIDIA kernel modules and create device nodes
ii nvidia-persistenced 525.85.05-1 amd64 daemon to maintain persistent software state in the NVIDIA driver
ii nvidia-settings 525.125.06-1~deb12u1 amd64 tool for configuring the NVIDIA graphics driver
ii nvidia-smi 525.125.06-1~deb12u1 amd64 NVIDIA System Management Interface
ii nvidia-support 20220217+3~deb12u1 amd64 NVIDIA binary graphics driver support files
ii nvidia-vdpau-driver:amd64 525.125.06-1~deb12u1 amd64 Video Decode and Presentation API for Unix - NVIDIA driver
ii nvidia-vulkan-common 525.125.06-1~deb12u1 amd64 NVIDIA Vulkan driver - common files
ii nvidia-vulkan-icd:amd64 525.125.06-1~deb12u1 amd64 NVIDIA Vulkan installable client driver (ICD)
ii xserver-xorg-video-nvidia 525.125.06-1~deb12u1 amd64 NVIDIA binary Xorg driver
And to confirm that everything is installed correctly, I can run:
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A4000 On | 00000000:1C:00.0 Off | Off |
| 41% 40C P8 18W / 140W | 1MiB / 16376MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I’ve uninstalled and reinstalled Nvidia drivers.
I noticed that Disp.A = Off
so for some reason, the card is not detecting the monitor even though it is plugged in. My first thought was to force a resolution, but enabling GRUB_GFXMODE=640x480
and running update grub
did not change the problem.
And I attempted to switch to tty2 by using ALT-F2 (or CTRL-ALT-F2) but the computer was unresponsive. I ran CTRL-ALT-CEL to ensure the computer was receiving keyboard inputs and it restarted as expected
And for those who are interested,
inxi -Fxxxzra
System:
Kernel: 6.1.0-13-amd64 arch: x86_64 bits: 64 compiler: gcc v: 12.2.0
parameters: BOOT_IMAGE=/boot/vmlinuz-6.1.0-13-amd64
root=UUID=8ae679d5-60b9-4d97-aead-8dc82d1bd400 ro quiet rd.driver.blacklist=grub.nouveau
rcutree.rcu_idle_gp_delay=1 quiet nouveau.modeset=0
Console: pty pts/0 DM: GDM3 v: 43.0 Distro: Debian GNU/Linux 12 (bookworm)
Machine:
Type: Desktop Mobo: Micro-Star model: B350 TOMAHAWK (MS-7A34) v: 1.0
serial: <superuser required> UEFI-[Legacy]: American Megatrends v: 1.M0 date: 01/23/2019
CPU:
Info: model: AMD Ryzen 5 1600 bits: 64 type: MT MCP arch: Zen level: v3 note: check
built: 2017-19 process: GF 14nm family: 0x17 (23) model-id: 1 stepping: 1 microcode: 0x8001137
Topology: cpus: 1x cores: 6 tpc: 2 threads: 12 smt: enabled cache: L1: 576 KiB
desc: d-6x32 KiB; i-6x64 KiB L2: 3 MiB desc: 6x512 KiB L3: 16 MiB desc: 2x8 MiB
Speed (MHz): avg: 1654 high: 2800 min/max: 1550/3200 boost: enabled scaling:
driver: acpi-cpufreq governor: schedutil cores: 1: 1550 2: 2800 3: 1550 4: 1550 5: 1550 6: 1550
7: 1550 8: 1550 9: 1550 10: 1550 11: 1550 12: 1550 bogomips: 76791
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities:
Type: gather_data_sampling status: Not affected
Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: mmio_stale_data status: Not affected
Type: retbleed mitigation: untrained return thunk; SMT vulnerable
Type: spec_rstack_overflow mitigation: safe RET
Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via prctl
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization
Type: spectre_v2 mitigation: Retpolines, IBPB: conditional, STIBP: disabled, RSB filling,
PBRSB-eIBRS: Not affected
Type: srbds status: Not affected
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: NVIDIA GA104GL [RTX A4000] vendor: Lenovo driver: nvidia v: 525.125.06
non-free: 530.xx+ status: current (as of 2023-03) arch: Ampere code: GAxxx
process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 1 speed: 2.5 GT/s lanes: 16 link-max: gen: 4
speed: 16 GT/s bus-ID: 1c:00.0 chip-ID: 10de:24b0 class-ID: 0300
Display: server: X.org v: 1.21.1.7 with: Xwayland v: 22.1.9 driver: N/A note: X driver n/a
tty: 157x85
API: OpenGL Message: GL data unavailable in console. Try -G --display
Audio:
Device-1: NVIDIA GA104 High Definition Audio vendor: Lenovo driver: snd_hda_intel v: kernel
pcie: gen: 1 speed: 2.5 GT/s lanes: 16 link-max: gen: 4 speed: 16 GT/s bus-ID: 1c:00.1
chip-ID: 10de:228b class-ID: 0403
Device-2: AMD Family 17h HD Audio vendor: Micro-Star MSI driver: snd_hda_intel v: kernel pcie:
gen: 3 speed: 8 GT/s lanes: 16 bus-ID: 1e:00.3 chip-ID: 1022:1457 class-ID: 0403
API: ALSA v: k6.1.0-13-amd64 status: kernel-api tools: alsamixer,amixer
Server-1: PipeWire v: 0.3.65 status: active with: 1: pipewire-pulse status: active
2: wireplumber status: active 3: pipewire-alsa type: plugin tools: pw-cat,pw-cli,wpctl
Network:
Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: Micro-Star MSI
driver: r8169 v: kernel pcie: gen: 1 speed: 2.5 GT/s lanes: 1 port: f000 bus-ID: 19:00.0
chip-ID: 10ec:8168 class-ID: 0200
IF: enp25s0 state: up speed: 100 Mbps duplex: full mac: <filter>
IF-ID-1: docker0 state: down mac: <filter>
Drives:
Local Storage: total: 931.51 GiB used: 7.32 GiB (0.8%)
SMART Message: Required tool smartctl not installed. Check --recommends
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Samsung model: SSD 980 1TB size: 931.51 GiB
block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter>
rev: 3B4QFXO7 temp: 38.9 C scheme: MBR
Partition:
ID-1: / raw-size: 930.56 GiB size: 914.88 GiB (98.31%) used: 7.32 GiB (0.8%) fs: ext4
dev: /dev/nvme0n1p1 maj-min: 259:1
Swap:
Kernel: swappiness: 60 (default) cache-pressure: 100 (default)
ID-1: swap-1 type: partition size: 976 MiB used: 0 KiB (0.0%) priority: -2 dev: /dev/nvme0n1p5
maj-min: 259:3
Sensors:
System Temperatures: cpu: 38.9 C mobo: N/A gpu: nvidia temp: 41 C
Fan Speeds (RPM): N/A
Repos:
Packages: pm: dpkg pkgs: 1746 libs: 1005 tools: apt,apt-get,gnome-software,synaptic
Active apt repos in: /etc/apt/sources.list
1: deb http://deb.debian.org/debian/ bookworm main contrib non-free non-free-firmware
2: deb-src http://deb.debian.org/debian/ bookworm main contrib non-free non-free-firmware
3: deb http://security.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware
4: deb-src http://security.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware
5: deb http://deb.debian.org/debian/ bookworm-updates main contrib non-free non-free-firmware
6: deb-src http://deb.debian.org/debian/ bookworm-updates main contrib non-free non-free-firmware
No active apt repos in: /etc/apt/sources.list.d/docker.list
Info:
Processes: 216 Uptime: 31m wakeups: 0 Memory: 31.29 GiB used: 958.2 MiB (3.0%) Init: systemd
v: 252 target: multi-user (3) default: multi-user tool: systemctl Compilers: gcc: 12.2.0 alt: 12
Shell: Bash v: 5.2.15 running-in: pty pts
[nvidia-bug-report.log.gz|attachment](upload://xMGE2iJbqAal4DUP3fjOnDvKXUE.gz) (314.4 KB)
/0 (SSH) inxi: 3.3.26
Thank you kindly
nvidia-bug-report.log.gz (314.4 KB)
systemd.txt (91.3 KB)