Unexpected Shutdown During ComfyUI Inference on DGX Spark (Occurs on Two Units)

, ,

Hello,

I would like to discuss an unexpected shutdown issue that occurs during ComfyUI inference.

I own two NVIDIA DGX Spark systems, and the same issue is now occurring on both devices. Initially, the problem appeared on only one unit, but it is currently happening on both.

I followed the field diagnostics guide provided here:

I ran the full diagnostic test three times on each device, and all tests passed successfully.

I also reviewed and referenced the following forum post:

As mentioned in that post, when I apply a power limit, the shutdown issue does not occur. But fundamentally, I can’t understand why it can’t even sustain the amount of power required to hold the stock/default clock speeds.

For context, I am able to run the recently released Qwen 3.5 122B-A10B model with 250 concurrent requests for game translation workloads. Even under heavy load with extremely high cooling fan activity, the system does not shut down.

However, when I start inference in ComfyUI, the system shuts down after only 1–2 steps. Occasionally it completes without issue, but in most cases the power is abruptly cut off.

Is this a ComfyUI-specific issue?
Is this a hardware issue with my units?
Could it be variability between individual devices?

What makes this more confusing is that my wife also owns a DGX Spark, and her system does not shut down during ComfyUI inference, regardless of power limits.

Given that both of my units now exhibit the same behavior while passing diagnostics, I am uncertain whether this is a firmware, power delivery, thermal protection, or workload-related issue—or whether I should proceed with an RMA request.

I would greatly appreciate any insight into what might be causing this behavior.

1 Like