CPU overheat causes shutdown

Our DGX kept shutting down on its own recently. After checking the kernal.log, we found “core temperature above threshold” on the CPUs (see figure1). We opened up the DGX and found that we were running out of coolant for the GPUs in the coolant tank. However, the coolant for CPUs seems to be stored inside the coolant pipes only and isolated from the coolant tank for the GPUs, and we couldn’t find any indicators of how much CPU coolant we still have. After checking the DGX documentation, the coolant kit seems to only refill the coolant tank which is only for GPUs. What about refilling the coolant for the CPUs?

I will be grateful for any advice if I misunderstood anything or any other helpful information that may help solve this issue.

we are currently facing a similar issue with our DGX V100 System which reaches CPU Temperatures around 100°C (according to lm-sensors) without any load on the system.
Changing thermal paste didn’t solve the issue.

Did you in the meantime find any solution or further ways to troubleshoot the issue?


We simply changed the entire CPU coolant and the problem was solved.

Hello @k50112113,
we have the same problem with a “DGX Station v100” and to me it looks like the CPU cooler is a closed water-cooling from Corsair, with no possibility to refill/change something. How did you refill/change the coolant of this system?

Hello @schreihs,
have you found a solution for your problem?

Hi @edv14,

I have seen some tutorials of refilling the Corsair cooler but you have to open up the closed coolant loop and refill the coolant skillfully (preventing any air from going in), which seems difficult.

So we resolved this issue by simply replacing the entire Corsair cooler. This is the one we bought: CORSAIR - iCUE H60X RGB ELITE AIO Liquid CPU Cooler 120mm Radiator. It is about $ 80 online. Please make sure the “processor socket” specified is compatible with your DGX station.

I hope this helps :)


Hello @k50112113,
thanks for the quick reply and the clarification - that helps a lot. Just to be sure, you have the “DGX Station v100” ( [DGX DL WS 4V100/256GB 32G) as well, right?

Hi @edv14,



Perfect, thanks @k50112113!

Hi Guys,

we just installed yesterday a normal CPU-Cooling System (be quiet! Pure Rock 2) (without Water). Works perfectly fine with trainings on all 4 GPU cards the CPU stays under 40° Celsius.

Thanks @schreihs!