Our DGX kept shutting down on its own recently. After checking the kernal.log, we found “core temperature above threshold” on the CPUs (see figure1). We opened up the DGX and found that we were running out of coolant for the GPUs in the coolant tank. However, the coolant for CPUs seems to be stored inside the coolant pipes only and isolated from the coolant tank for the GPUs, and we couldn’t find any indicators of how much CPU coolant we still have. After checking the DGX documentation, the coolant kit seems to only refill the coolant tank which is only for GPUs. What about refilling the coolant for the CPUs?
I will be grateful for any advice if I misunderstood anything or any other helpful information that may help solve this issue.