DGX SPARK GPU crash

Was trying to run OpenWebUI and would not start. Went to terminal and did nvidia-smi which showed error no gpu….. (i did not get capture of that output).

I restarted system via reboot - now command and when came back up looked in logs for error. I have attached my capture of hopefully relevant logs from syslog.1 and kern.log.

System back up and running ok for now. This is just a FYI as I don’t know where else to report such, but I figure it best to report this as it is “bleeding edge”. (been there and done that for over 50 years the bleeding edge stuff but mostly mainframes).

dgx spark GPU crashed.txt (59.3 KB)

Thank you for reporting this. If you see repeating behavior please reach back out

My dgx spark crashed multiple times while accessing nvidia documenting on the web using Firefox

Not even running anything

Its not even one day old Spark.

Can you describe more about the crash? Was this just a Firefox application crash or did the system freeze or reboot? If you can include an sos report this might help in analyzing the issue.

What kind of Display do you have ? HDMI or USBC?

Now my GPU is crashing during day,. I reboot system (which takes 10-15 minutes with what appears to be multiple boots … i can ping physical address but not the spark-xxxx.local one) the local one goes away a couple times before the spark-xxxx.local becomes available)

nvidia-bug-report.log.gz (1.4 MB)

I have rebooted system and ran nvidia-log-report.sh (it will not run when GPU is crashed).

From the bug report, your system seems to be updated as both the GPU driver and kernel have been updated from the original ones installed in the factory. It should only reboot once during a reboot unless it is failing to apply some firmware updates. Has this Spark been reimaged with the recovery media?

yes i had to reimage it from usb boot.

image was dgx-spark-recovery-image-1.91.51.

I saw that there was a new image for boot recovery so i installed that and I have NOT had my gpu crash, it was crashing at least once a day, now it hasn’t crashed in last few days. I have a process running to let me know if it does crash and it hasn’t reported anything. Still getting system set up so i can monitor it. So far any problems I have had have been my own doing so far.

So at least for this problem it has been resolved.

Thanks

Recovery image i used was dgx-spark-recovery-image-1.91.51-1.tar.gz
the prior image was without the -1.

This has solved my GPU crash problem i was having.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.