TL;DR: Brand new DGX Spark became a $10K+ brick within 24 hours. System crashes during routine apt upgrade, then Docker container causes complete freeze. Cannot access UEFI reliably, USB boot completely fails, all recovery methods non-functional. 12+ hours troubleshooting, system is unusable. Has anyone else experienced this?
Background
Purchased DGX Spark for local LLM inference (Kimi K2.5, DeepSeek, etc.) and blockchain node operations. This is my first NVIDIA enterprise product - previously used Mac Studio and cloud GPUs.
System arrived Wednesday. Initial setup went fine, got through the first-boot wizard, everything looked beautiful. Then things went catastrophically wrong.
Day 1: The Update That Broke Everything
Thursday morning, January 29, 2026
Did what any reasonable person does with a new Linux system:
sudo apt update && sudo apt upgrade -y
Standard stuff, right? Mid-update, the system spontaneously rebooted. No warning, no error message, just… gone.
When it came back up, I had to reconfigure EVERYTHING:
-
Keyboard mapping
-
Mouse settings
-
Display configuration
-
Network settings
It was like the system forgot it had been set up at all.
Boot logs showed these errors:
platform NVDA8800:00: failed to claim resource 0: [mem 0x05170000-0x051c...]
acpi NVDA8800:00: platform device creation failed: -16
platform NVDA8900:00: failed to claim resource 0: [mem 0xc8000000-0xd7ff...]
acpi NVDA8900:00: platform device creation failed: -16
ACPI resource conflicts on a brand-new, out-of-box NVIDIA system? For NVIDIA devices? That seemed… wrong.
Spent 2 hours adding kernel parameters to work around it:
GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=ttyS0,921600 pci=realloc pci=nocrs acpi=force iommu=pt nvidia-drm.modeset=1"
Got it stable-ish. Figured maybe it was just a quirk. Moved on.
Day 2: The Crash That Killed Everything
Friday morning, January 30, 2026
Time to actually use this thing for what I bought it for. Tried deploying Kimi K2.5 locally via vLLM in Docker:
docker run -d \
--name kimi-k2.5 \
--gpus all \
--restart always \
-p 8000:8000 \
-v ~/models/kimi-k2.5:/model \
vllm/vllm-openai:latest \
--model /model \
--host 0.0.0.0 \
--port 8000 \
--trust-remote-code
Standard vLLM deployment. Nothing exotic.
System immediately crashed. Complete freeze. Had to hard reset (held power button).
On reboot: System completely inoperable.
What I see:
-
Ubuntu login screen appears
-
Can type password (keyboard works!)
-
Press Enter…
-
Desktop starts to load…
-
Complete freeze
-
Mouse cursor frozen
-
Keyboard stops responding
-
Nothing works
Can’t access:
-
❌ Ctrl+Alt+F1-F6 (TTY consoles) - No response
-
❌ Ctrl+Alt+T (terminal) - No response
-
❌ Ctrl+Alt+Backspace - No response
-
❌ Any keyboard shortcuts - No response
SSH: Connects for maybe 10-15 seconds after boot, then freezes. Not enough time to run commands.
The Docker container with --restart always is launching on boot, crashing the system, preventing me from disabling it. Classic catch-22.
The 12-Hour Troubleshooting Odyssey
Attempt 1: GRUB Recovery Mode
Tried to access GRUB:
-
Hold Shift during boot - Doesn’t work
-
Tap Esc repeatedly - Works maybe 1 in 20 times
-
Hold Shift while pressing power - Doesn’t work
-
Ctrl+Alt+Del then Esc - Doesn’t work
When I DID get GRUB once:
-
Got to
grub>prompt -
Typed
normal- system booted to frozen login again -
Tried editing boot parameters (press ‘e’):
-
Added
systemd.unit=rescue.target- System ignored it -
Added
single init=/bin/bash- System ignored it -
Added
systemd.unit=multi-user.target- System ignored it
-
Boot modifications are not being applied. Why?
Attempt 2: UEFI/BIOS Access
Keys I’ve tried to access UEFI:
-
F1, F2, F8, F10, F11, F12, Del, Esc
-
Hold methods, tap methods, various timing combinations
-
Pressed every key imaginable during boot
Success rate: ~10% (got in maybe 2-3 times out of 25+ attempts)
When I DID get into AMI Aptio Setup:
-
Boot tab shows: “Boot Option #2: UEFI PXE IPv4…”
-
Boot Option #1 is missing (where’s the internal drive?)
-
USB drives never appear in boot options (more on this below)
-
Can’t find Secure Boot setting anywhere
For a $10K+ system, I expect UEFI access to work 100% of the time, not 10%.
Attempt 3: USB Recovery Boot
Created Ubuntu 24.04.3 LTS bootable USB:
-
Used
ddon macOS:sudo dd if=ubuntu.iso of=/dev/rdisk6 bs=1m -
Verified completion: 753+0 records in/out, 789,577,728 bytes transferred
-
Checked structure on Mac: EFI partition present, bootable structure confirmed
-
This USB boots fine on my MacBook and other systems
Tried to boot from USB on DGX:
-
❌ USB never appears in UEFI boot options (tried multiple times)
-
❌ Changed boot priority to USB #1 in UEFI - system still boots to internal drive
-
❌ F11 boot menu - Never appears
-
❌ F12 boot menu - Never appears
-
❌ F8 boot menu - Never appears
-
Tried front USB ports - Not detected
-
Tried rear USB ports - Not detected
-
Tried USB 2.0 ports - Not detected
-
Tried USB 3.0 ports - Not detected
-
Tried different USB drives - None detected
The DGX Spark cannot detect USB boot media at all. This makes standard recovery impossible.
Attempt 4: Different Keyboards/Mice
Thought maybe it was peripheral compatibility:
-
Swapped to wired USB keyboard - Same issue
-
Swapped to wireless keyboard (different brand) - Same issue
-
Tried different USB ports - Same issue
-
Multiple mice tested - Same issue
Keyboard works perfectly for password entry, then complete lockup. This suggests the freeze happens during/after display manager initialization.
Attempt 5: SSH Recovery Loop
Since SSH worked briefly, tried automated recovery:
while true; do
ssh wulfkaal@10.0.0.186 "sudo systemctl stop docker; sudo systemctl disable docker" && break
sleep 1
done
Connected 3-4 times, but connection drops in <15 seconds before commands complete. Not enough time to disable Docker.
Current Status: Completely Bricked
What I have:
-
✅ Expensive green paperweight
-
✅ System that boots to a frozen login screen
-
✅ Keyboard that works for exactly 10 seconds
-
✅ 12+ hours of wasted research time
-
✅ Zero productive work accomplished
What I don’t have:
-
❌ Access to UEFI when I need it
-
❌ USB boot capability
-
❌ TTY console access
-
❌ Recovery options that work
-
❌ A functioning AI workstation
Technical Analysis
Issue 1: ACPI Resource Failures
The boot logs show systematic ACPI resource allocation failures for NVIDIA devices. These errors appear every boot:
platform NVDA8800:00: failed to claim resource 0
acpi NVDA8800:00: platform device creation failed: -16
This suggests firmware-level issues with device initialization.
Issue 2: USB Controller Failure
USB devices are completely invisible to UEFI boot options. I’ve created verified bootable USBs that work on other systems, but the DGX cannot see them at all. This matches the pattern I found searching these forums: [“DGX Spark is Inoperable: Failed USB Controller/Firmware”](link if available)
Issue 3: GRUB/Bootloader Instability
GRUB access is inconsistent (10% success rate), and when accessible, boot parameter modifications are ignored. This isn’t normal behavior for Ubuntu systems.
Issue 4: Post-Login Freeze
System freezes specifically after authentication, suggesting GPU driver or display manager initialization issue. The timing is consistent: password works → desktop starts loading → complete freeze.
Questions for the Community
Has anyone else experienced this?
Specifically:
-
USB boot failures - UEFI not detecting bootable USB drives?
-
UEFI access inconsistency - F2/Del/Esc only working occasionally?
-
Post-login freeze - System locks up after password entry?
-
ACPI resource errors - Platform device creation failures on boot?
-
Docker container crashes - vLLM or other GPU containers causing system freeze?
Questions:
-
Is there a known firmware version that fixes USB detection?
-
Is there a special key combination for UEFI I’m missing?
-
Has anyone successfully recovered from a similar state?
-
Should USB boot just… work? Or is there a setting I’m missing?
-
Is this a known issue with DGX Spark? (I’m seeing hints in old forum posts)
What I Need
Immediate:
-
Way to disable Docker service without GUI/SSH access
-
Reliable method to access UEFI/BIOS
-
Solution for USB boot detection failure
-
Or… any other recovery method I haven’t tried
Long-term:
-
Firmware update addressing these issues?
-
Confirmation this is a known problem?
-
Timeline for fixes?
My Setup
-
Model: DGX Spark
-
GPU: NVIDIA GB10 (Blackwell)
-
OS: DGX OS (Ubuntu 24.04 based) - out of box, minimal changes
-
Purchase: January 2026
-
Use case: Academic AI research (LLM inference, blockchain nodes)
-
IP: 10.0.0.186 (for reference)
Why This Matters
I’m not just annoyed - I’m genuinely concerned:
-
Research Impact: This was purchased for time-sensitive academic research. Every day it’s down is a day of lost productivity.
-
Recovery Design Flaw: A professional AI workstation should have reliable recovery options. USB boot is the standard recovery method for Linux systems. If that doesn’t work, what’s the fallback?
-
Update Stability: A system marketed for AI development should handle standard
apt upgradewithout catastrophic failure. -
Community Pattern: Searching these forums, I see similar reports (USB boot issues, UEFI access problems, post-update instability). Are these isolated incidents or a systemic issue?
What I’ve Learned
For others considering DGX Spark:
❌ Don’t run apt upgrade without backups
❌ Don’t use Docker --restart always flag
❌ Don’t expect USB recovery to work
❌ Don’t expect UEFI access to be reliable
❌ Don’t expect standard Linux recovery procedures to work
✅ Do have enterprise NVIDIA support contract
✅ Do have someone who can physically access the system
✅ Do consider alternatives (Lambda Vector, HP/Dell workstations, custom builds)
Request to NVIDIA
If NVIDIA support is reading this:
I need help today. I cannot wait for firmware updates or lengthy support processes. This system is:
-
Defective out of box (USB boot failure)
-
Unstable (crashes from standard operations)
-
Unrecoverable (all standard recovery methods fail)
-
Unfit for purpose (cannot run AI workloads)
I need either:
-
Immediate remote assistance to recover the system
-
RMA/replacement unit with verified firmware
-
Full refund so I can purchase reliable hardware
I’m an academic researcher, not an enterprise customer with an IT department. I need a workstation that works.
Call to Community
If you’ve experienced similar issues:
-
Please reply with your experience
-
Share any workarounds you’ve found
-
Confirm if this is a known issue
-
Help me validate this isn’t just “user error”
If you work at NVIDIA:
-
Please escalate this
-
Please confirm if USB boot is a known issue
-
Please provide timeline for firmware fixes
-
Please help me recover this system
If you’re considering DGX Spark:
-
Read this carefully
-
Search forums for similar issues
-
Consider alternatives
-
Have a backup plan
Update History
January 30, 2026 (13:00 CST): Initial post
January 30, 2026 (18:00 CST): Attempted USB recovery - failed
January 31, 2026: [will update with any progress]
I want to love this system. The hardware is beautiful. But right now, it’s unusable.
Has anyone else had these issues? Am I the only one? Please help.
System Info for NVIDIA Support:
-
Hostname: spark-76b8
-
IP: 10.0.0.186
-
Purchase: January 24, 2026
-
Order #155970
-
Logs available on request
#DGXSpark #NVIDIASupport #RecoveryHelp #USBBootFailure #UEFIIssues