I have two DGX Spark units that I intended to cluster using a ConnectX cable. I first performed the initial setup for each device separately.
Both units were delivered on October 23, and I started using one of them immediately. The ConnectX cable arrived on October 29, at which point I unpacked the second DGX Spark and began setting it up.
In summary, I set up the first unit on October 23, and began setting up the second one on October 29 with the cable connected. However, a serious issue occurred during the setup of the second unit.
After connecting the second DGX Spark to the internet, it automatically started an update process. During the update, it rebooted — and after that, the display stopped working properly. Out of roughly ten attempts to boot, it would only reach the OS once in a while.
Here are the steps I took to troubleshoot:
Tested multiple HDMI cables (versions 2.1, 2.0, and 1.4) and several monitors (UHD, QHD, and FHD) — all produced the same result.
Followed the official recovery guide (it was extremely difficult even to access the UEFI screen — successful only about once every ten boot attempts).
Performed many power cycles, disconnecting all cables and waiting several minutes between attempts.
Used a wired USB-C keyboard that works fine with Ubuntu Linux, Windows, and macOS.
The same monitor and cables worked perfectly with the other DGX Spark, which was set up successfully on October 23. The problem only occurs with the unit I started setting up on October 29.
At this point, the system can no longer reach even the profile or login screen. Occasionally, it will show the NVIDIA logo and loading animation but then turn black again and fail to continue.
What can I do now?
I’ve already attempted recovery, but the unit remains unresponsive. Should I proceed with an RMA or return request, or is there another way to fix this issue?
I was also stuck with a black screen after the logo, and found that doing these two commands from an ssh and rebooting helped:
sudo apt-get purge gdm3
sudo apt-get install gdm3
sudo reboot now
I also had problem with the UEFI screen access, and changed the timeout setting in the BIOS from 1 to 5 seconds once I got in there. that seemed to help.
Thank you. I can now easily access the UEFI, and the system can boot from Power ON → UEFI → OS, but it still cannot boot directly from Power ON → OS. Do you have any ideas on how to fix this?
Yes, both SSH access and ping work. I can also enter the OS through the UEFI, but when I press the power button, the system cannot boot directly into the OS. Furthermore, when the OS fails to boot, both SSH and ping stop working. Do you have any ideas on this?
Hi, after turning on the machine and the screen is black after a few minutes, can you still SSH into the machine even if you see no video?
If so, you can try starting the desktop related services with the command sudo systemctl start gdm gnome-remote-desktop cups cups-browsed
No, the screen doesn’t turn black a few minutes after turning on the computer — it’s black from the very start because the OS doesn’t load at all. Also, I can only boot into the OS if I go into the UEFI, change nothing, and then press “Save Changes and Exit.” (Once I’m in the OS, everything works perfectly fine. However, if I turn off the power, I can never boot into the OS again unless I enter the UEFI and press “Save Changes and Exit.”) Also, even after running the command you suggested, the problem has never been resolved.
@cgn1234 are you using the desktop? If not start your device in multi-user mode and see if it boots okay. Run sudo systemctl set-default multi-user.target and restart. If you get to a login: prompt then your system is fine! The issue could be caused by the gdm service as mentioned several times through posts.
If you want back the desktop run sudo systemctl set-default graphical.target