My DGX Spark stopped booting after a period of degradation. It now appears to have a firmware/storage boot failure.
Timeline:
- The system began degrading about 72 hours before total failure.
- Services became unstable, then the device was rebooted.
- After reboot, the DGX Spark no longer came back on the network.
- With monitor/keyboard attached, it repeatedly returned to BIOS instead of booting Ubuntu.
- The Ubuntu boot entry was present at first, then later disappeared or failed.
- Boot override to Ubuntu returned to BIOS.
- UEFI showed EFI files under \EFI\ubuntu, including grubaa64.efi and shimaa64.efi, but selecting them did not boot.
- The device now sometimes shows “Synchronous Exception at 0x00000000856668E4” on the NVIDIA splash screen before BIOS/boot.
Important symptoms:
- BIOS/UEFI loop.
- NVIDIA splash “Synchronous Exception”.
- Ubuntu boot path fails.
- Internal Samsung NVMe was visible in BIOS earlier: SAMSUNG MZALC4T0HBL1-00B07.
- No OS recovery/reflash has been run because I need to preserve data.
Data-preservation attempt:
- I removed the internal M.2 NVMe only after the system could not boot, to attempt read-only data recovery.
- On another DGX Spark, the drive appears as a 3.7TB Samsung device with EFI and ext4 root partitions.
- Read-only mount attempts hang or fail.
- rsync/read attempts return Input/output errors.
- fsck.ext4 was run with no-write flags only: fsck.ext4 -fn.
- fsck reported: “Error reading block 552 (Input/output error)”.
- Kernel logs showed errors like:
- critical target error, dev sda, sector …
- I/O error, dev sda
- EXT4-fs warning/error reading directory block
- FAT-fs corrupted directory on the EFI partition
- I have not run repair, format, recovery, reinstall, or destructive commands.
Request:
Please advise whether this qualifies for warranty/RMA service. I would also like guidance on the safest data-preserving path before any factory recovery or SSD reflash, since NVIDIA recovery appears likely to erase the internal SSD.