Warranty Claim - DGX Spark Unit Defective

I purchased 2 NVIDIA DGX Spark units on October 25, 2025. One unit (Spark-B) works perfectly. The second unit (Spark-A) repeatedly crashes under sustained GPU load.

Issue:

Unit powers off abruptly during GPU inference workload
No kernel errors or warnings in logs β€” just stops mid-operation
Occurs within 30-60 minutes of sustained GPU use
4 crashes in 3 days under normal workload

Comparison:

Identical Spark-B unit running same workload, same duration β€” zero crashes
Same software, same power source, same environment

Evidence attached:

Purchase receipt
System logs showing crash pattern (boots end mid-inference with no errors)

I expect a replacement unit.

Olivier Roy

for the β€œSpark B” unit, please install and execute Field Diag NVIDIA DGX Spark Field Diagnostics | NVIDIA , then share the resulting logs with me (DM me please). Thank you.

Submitted the requested details via DM on Feb. 25th. still awaiting a response. Support case also unresponsive.

Hi,

Based on the log bundles you shared, Field Diagnostics did not run successfully. The logs indicate that the GPU driver could not be unloaded.

This is likely because a process or service is currently using the GPU. The driver must be fully unloaded before Field Diagnostics can run.

Please power cycle the unit, then run Field Diagnostics again.

Launching field diagnostics ...

Command Line: ./partnerdiag --field

Removing Nvidia drivers and services...

Stopping 'docker.service', but its triggering units are still active:

docker.socket

Stopping 'systemd-udevd.service', but its triggering units are still active:

systemd-udevd-control.socket, systemd-udevd-kernel.socket

rmmod: ERROR: Module nvidia_drm is in use

rmmod: ERROR: Module nvidia_drm is in use

rmmod: ERROR: Module nvidia_drm is in use

rmmod: ERROR: Module nvidia_drm is in use

Per our DM follow-up, RMA approved. The support team should be reaching out to you soon. if you experience any issues, please let me know.

1 Like