DGXSPARK temperature too high, automatic shutdown。

Just make sure you’re checking logs to ensure you’re crashing due to overheating and not OOM. I repasted and put new thermal pads down (all Thermal Grizzly Kryonaut) and was still crashing. I was crashing due to heat and OOM, so it was hard to isolate one over the other. After repasting I was still crashing due to OOM despite using vetted community recipes for my cluster. I changed my swap size and yes (sadly) set some GPU clock limits and I haven’t crashed since. A moderator posted about it here.