Random reboots and 00 screen

Hi,
my DGX SPARK keeps rebooting automatically after a few minutes of use or even while idle. I’ve already tested with a power cable from another DGX SPARK and the issue remains the same.

Below are the steps I have already completed following NVIDIA Support instructions:

  1. Power and connections check: Verified stable power source and confirmed that all cables (power, HDMI/DisplayPort, USB) are securely connected.
    Note: The DGX SPARK has no chassis LEDs to check for fault patterns.

  2. BIOS access: I can access BIOS normally. I loaded the optimized defaults, saved, and rebooted.
    Issue persists.

  3. Recovery USB: Created a DGX SPARK recovery USB on another system using the latest recovery image and the CreateUSBKey script.
    Issue persists.

  4. OS Reinstallation: Booted from the recovery USB and selected “Reinstall OS and Drivers.”
    Issue persists.

  5. Factory Reset: Powered off the unit, held the power button for 15 seconds, then booted again from the recovery USB and performed “START RECOVERY / Factory Reset” to fully reflash the internal SSD.
    Issue persists.

  6. Post-reset validation: The system launched the Setup Wizard correctly, but after logging in the device still reboots randomly within minutes, even without running any workload.
    Issue persists.

There is no specific trigger for the reboot. It happens randomly, but very frequently. The longest it has stayed on without rebooting is about 15 minutes.

I appreciate any guidance on additional diagnostics or next steps.

To help you debug this, can you send me an nvidia-bug-report as well as logs from your dmesg and kernel?

Sure, here you have, btw it restarted 2 times doing this operations.
bug-report.txt (2.3 KB)
dmesg.txt (107.6 KB)
kernel-journal.log (134.8 KB)

There should be a 'nvidia-bug-report.log.gz' file in the directory you ran the bug report command. Can you send that as well?

Can you check you are using the right power supply that came with DGX Spark? Should be 240W USBC power supply and should be connected to the USBC port next to the Power Button

Here you have attached..
nvidia-bug-report.log.gz (379.4 KB)
I have tried another DGX SPARK Power Cable because i have 2 more and thats not seem to be the problem. based on my 9 years of experience as an Nvidia Elite Partner i think that PROBABLY its a HW error.

1 Like

Thanks for trying a different cable. After you experience a random reboot, can you share the output of journalctl -k -b -1 -e? This may give a better clue as to why it’s shutting down

Sure, here you have.

journalctl.txt (194.7 KB)

Hi mceballos,

I am having the exact same issue on one of my DGX boxes. Was there any resolution suggested by Nvidia support?

Thanks!

Hi, if your problems persist, we recommend you run the DGX Spark Field Diagnostics to see if there is an issue with your unit

(Adding the CUDA repository in Step 2 may be redundant: just make sure that the sudo apt install dgx-spark-fieldiag command successfully runs)

Hi, and thank you for your reply! Followed the instructions, installed and run the fieldiags and it seems it didn’t detect any HW problems with this unit. It all came back as OKs across all tests.

In the meantime, I read someone suggested turning off a switch in BIOS (Advanced → Advanced → Watchdog) and turn that off. This is the only way I could have this unit stay up and working, which I realize its not the best thing to do.

Any other suggestions with all that done and said?

Appreciate your support!

The watchdog is managed by the sbsa_gwdt kernel module. You can check if it’s loaded using:

$ lsmod | grep sbsa
sbsa_gwdt             196608  1

Not sure why it wouldn’t be loaded with the default image. To load it manually, use insmod sbsa_gwdt but that would only be temporary until you reboot the system again, or add it to, for exampe, /etc/modules-load.d/watchdog.conf

# Load sbsa_gwdt.ko at boot
sbsa_gwdt

Interesting. I am going to test this now by re-enabling watchdog in the BIOS/etc/mules-load.s as you suggested and will come back if I see any issues again.

Thank you so much for your help!

Hi @mceballos, we just released an OS and kernel update today. Please update your unit to the latest version and let us know if you still experience unexpected reboots. If you still do, can you generate and send me an sos report that I can have engineering review?

@czankel Thank you so much for this post! It was spot on! For some reason after one of the kernel updates (I guess) somehow sbsa_gwdt was blacklisted and wasn’t loading, so removing the blacklist did the job.

Thanks you again!

Hi moderator, i downloaded latest update today of the system, i m having reboot issues, can you tell me what should i send u? I have founders edition GB10 DGX spark

Run Field Diagnostics

Run as root. Console or SSH is supported.

sudo init 3
cd /opt/nvidia/dgx-spark-fieldiag
sudo ./partnerdiag --field

i did this my display is gone now

how did u remove?

Hi @Sanyam0605, the command, sudo init 3 will temporarily put your system in TTY only mode. This is required to run field diagnostics. After you run the diagnostic, you can restart your machine to bring the desktop back. If the diagnostic shows failed tests, please open a case with customer support

yeah my bad, i panicked that time, the test came out to be PASS from summary json, hardware is perfect, nothing alarming there. But then why system keeps rebooting, right now the only hack i got to not make my system reboot is that manual thing

but this is a temporary fix i m assuming