Hardware-level power cut when doing nanochat pretraining

I try to pre-train nanochat with dgx spark but reached hardware-level power suddenly cut issue. Anyone has similar experience? How did you resolve the issue?

based on nvidia-bug-report.log, including kernel message

ACPI: thermal: [Firmware Bug]: No valid trip points!

Thanks

This is benign. The presence of this log event doesn’t indicate a problem, unless you are seeing specific performance or functional issues?

I cannot find other obvious issue in the log, please guide me if you have recommendation. However, the machine (dgx spark) just shuts down suddenly during the training. I did see CPU temperature is over 95 C from time to time. I assume that was the root of issue.