SHUTDOWN_REQ pulled low before reaching Thermal Throttling

We are experiencing a thermal problem using the Orin NX 16GB module on a custom carrier with our thermal solution:
The module is pulling the SHUTDOWN_REQ low when the SoC reaches a temperature of about 85°C. We verified that both LPDDR5 components do not exceed Tcase of 85°C like described in the Thermal Design Guide. The VDD_IN supply is also stable at 5V.

Test conditions:

  • 15W power model
  • running memtester tool in parallel on four cores

The last lines of tegrastats:

06-19-2023 14:47:46 RAM 15138/15430MB (lfb 34x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 24%@3199 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@82.406C CPU@85.625C SOC2@81.5C SOC0@82.75C CV1@81.531C GPU@81.687C tj@85.625C SOC1@82.843C CV2@80.437C VDD_IN 12101mW/10229mW VDD_CPU_GPU_CV 3095mW/2451mW VDD_SOC 3735mW/3228mW
06-19-2023 14:47:47 RAM 15138/15430MB (lfb 34x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 24%@3199 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@82.312C CPU@85.375C SOC2@81.531C SOC0@82.656C CV1@81.468C GPU@82.187C tj@85.375C SOC1@82.875C CV2@80.468C VDD_IN 12022mW/10230mW VDD_CPU_GPU_CV 3095mW/2451mW VDD_SOC 3696mW/3229mW
06-19-2023 14:47:48 RAM 15138/15430MB (lfb 34x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 24%@3199 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@82.375C CPU@85.218C SOC2@81.718C SOC0@82.593C CV1@81.406C GPU@82.218C tj@85.218C SOC1@82.843C CV2@80.437C VDD_IN 12061mW/10232mW VDD_CPU_GPU_CV 3095mW/2452mW VDD_SOC 3696mW/3229mW
06-19-2023 14:47:49 RAM 15138/15430MB (lfb 34x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 25%@3199 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@82.406C CPU@85.343C SOC2@81.718C SOC0@82.625C CV1@81.468C GPU@82.25C tj@85.343C SOC1@83.187C CV2@80.406C VDD_IN 12141mW/10233mW VDD_CPU_GPU_CV 3134mW/2452mW VDD_SOC 3775mW/3229mW
06-19-2023 14:47:50 RAM 15138/15430MB (lfb 34x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 24%@3199 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@82.718C CPU@85.781C SOC2@81.718C SOC0@82.625C CV1@81.75C GPU@82.156C tj@85.781C SOC1@83.125C CV2@80.625C VDD_IN 12101mW/10234mW VDD_CPU_GPU_CV 3134mW/2453mW VDD_SOC 3735mW/3230mW
06-19-2023 14:47:51 RAM 15138/15430MB (lfb 34x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 24%@3199 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@82.468C CPU@85.593C SOC2@81.718C SOC0@82.968C CV1@81.718C GPU@81.906C tj@85.593C SOC1@83.062C CV2@80.75C VDD_IN 12101mW/10235mW VDD_CPU_GPU_CV 3134mW/2453mW VDD_SOC 3735mW/3230mW

Is there any other component on the module to check, or another condition that can cause a shutdown request?

Thanks,
Ralf

Hi, as said in DG: SHUTDOWN_REQ* is driven active (low) by the module if the system must be shut down, due to a software shutdown request, over-temperature event, undervoltage event, or other faults. Did you see any warning info output before shutdown?

I don’t see any warnings before shutdown.

There is no clue then. Module won’t shutdown per the output info of tegrastats. Do you have log info?

There is nothing in the logs regarding overtemperature or other problems.

I did some further tests by heating specific module components only. It looks like heating up the VDD2 regulator on the bottom side to about 100°C is causing the problem. According to the Thermal Design Guide, this regulator has a Tcase of 150°C.

VDD2 regulator thermal shutdown threshold is 155°C. It won’t shutdown if only 100°C. Can you share full UART log when issue happen?

You are right, it’s not the VDD2 regulator. But I think it’s one of the components above the regulator:
Screenshot 2023-06-26 085344

Here is a thermal image of the situation while heating this area. The snapshot was taken at the moment the module is pulling SHUTDOWN_REQ:
Fail_3

This test setup is not using our thermal solution, because I replaced the SO DIMM socket by a vertical version to make the bottom side accessible. Also note that the system is idle.

Here is the UART log:
UART.txt (4.5 KB)

Is the image of real use case or just heating directly? There are some shutdown_req related components there. Some capacitors temperature threshold is only 85C.

This is not the real use case. It’s a test setup to figure out what is causing the shutdown. You can see a hot air heater coming from the left side directed at the componets.

I don’t think that is a correct test way. You should use a real use case image to observe if thermal at this zone will cause same failure.

Here is an image at the moment of failure during the memory test we use:

Screenshot 2023-06-27 075725

The hottest area is in the middle of the PCB at 94°C. The area in the upper left is at 90°C. The SoC temperature was close to the throttling temperature at about 97°C.

I did another test by attaching a small heat sink at the upper left mounting hole. This reduces the temperature in the upper left corner by about 5°C:

Screenshot 2023-06-27 090902

In this situation, the module doesn’t pull SHUTDOWN_REQ and it enters thermal throttling. Other regions of the board are even hotter than in the failing setup without the heat sink.

You said that some capacitors have a temperature threshold of 85°C. What exactly does this mean for us? Is this somethink that should be monitored?

So, it won’t fail if you attach a small heat sink at the upper left mounting hole, right? What’s the tegrastats output info then? If SoC is about 97°C, the tegrastats info should be able to output that.

In addition, do you have a chance to test with board in horizental than in vertical? The upper left zone will be affected more in vertical.

Yes, the temperature is reported by tegrastats.
The orientation is already horizontal.

Can you please share the tegrastats info at that moment? Also, can you try a test by removing C679 as below marked?

Here are the last lines of tegrastat:

06-27-2023 11:31:59 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.718C CPU@97.812C SOC2@95.156C SOC0@93.718C CV1@94.062C GPU@94.906C tj@97.812C SOC1@96.093C CV2@95.187C VDD_IN 11204mW/10250mW VDD_CPU_GPU_CV 3139mW/2772mW VDD_SOC 3343mW/3060mW
06-27-2023 11:32:00 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.718C CPU@97.812C SOC2@95.281C SOC0@93.875C CV1@94.062C GPU@94.937C tj@97.812C SOC1@96.125C CV2@95.031C VDD_IN 11164mW/10251mW VDD_CPU_GPU_CV 3139mW/2773mW VDD_SOC 3343mW/3060mW
06-27-2023 11:32:01 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.75C CPU@97.906C SOC2@95.281C SOC0@93.875C CV1@94.187C GPU@95.062C tj@97.906C SOC1@96.406C CV2@95.125C VDD_IN 11084mW/10252mW VDD_CPU_GPU_CV 3100mW/2773mW VDD_SOC 3343mW/3060mW
06-27-2023 11:32:02 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.843C CPU@97.843C SOC2@95.25C SOC0@93.812C CV1@94.156C GPU@95.093C tj@97.843C SOC1@96.281C CV2@95.062C VDD_IN 11084mW/10252mW VDD_CPU_GPU_CV 3100mW/2773mW VDD_SOC 3343mW/3061mW
06-27-2023 11:32:03 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 25%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.812C CPU@97.812C SOC2@95.312C SOC0@93.843C CV1@94.218C GPU@94.843C tj@97.812C SOC1@96.218C CV2@95.281C VDD_IN 11084mW/10253mW VDD_CPU_GPU_CV 3100mW/2773mW VDD_SOC 3264mW/3061mW
06-27-2023 11:32:05 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.781C CPU@97.875C SOC2@95.25C SOC0@93.75C CV1@94.093C GPU@94.843C tj@98.031C SOC1@96.281C CV2@95.187C VDD_IN 11164mW/10254mW VDD_CPU_GPU_CV 3139mW/2774mW VDD_SOC 3343mW/3061mW
06-27-2023 11:32:06 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 25%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.812C CPU@97.812C SOC2@95.218C SOC0@93.75C CV1@94.093C GPU@95.062C tj@97.812C SOC1@96.218C CV2@95.156C VDD_IN 11124mW/10255mW VDD_CPU_GPU_CV 3100mW/2774mW VDD_SOC 3343mW/3061mW
06-27-2023 11:32:07 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 24%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.781C CPU@97.656C SOC2@95.312C SOC0@93.875C CV1@94.062C GPU@95.031C tj@97.656C SOC1@96.281C CV2@95.156C VDD_IN 11044mW/10255mW VDD_CPU_GPU_CV 3100mW/2774mW VDD_SOC 3264mW/3061mW
06-27-2023 11:32:08 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.875C CPU@98.031C SOC2@95.312C SOC0@93.843C CV1@94.125C GPU@94.812C tj@98.031C SOC1@96.312C CV2@95.156C VDD_IN 11124mW/10256mW VDD_CPU_GPU_CV 3100mW/2775mW VDD_SOC 3343mW/3062mW
06-27-2023 11:32:09 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1531,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.906C CPU@97.968C SOC2@95.218C SOC0@93.906C CV1@94.281C GPU@94.812C tj@97.968C SOC1@96.437C CV2@95.125C VDD_IN 11124mW/10257mW VDD_CPU_GPU_CV 3100mW/2775mW VDD_SOC 3343mW/3062mW
06-27-2023 11:32:10 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 25%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.781C CPU@97.781C SOC2@95.375C SOC0@93.937C CV1@94.187C GPU@94.968C tj@97.781C SOC1@96.312C CV2@95.312C VDD_IN 11124mW/10257mW VDD_CPU_GPU_CV 3100mW/2775mW VDD_SOC 3304mW/3062mW
06-27-2023 11:32:11 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.812C CPU@97.812C SOC2@95.343C SOC0@93.875C CV1@94.218C GPU@94.968C tj@97.843C SOC1@96.281C CV2@95.281C VDD_IN 11084mW/10258mW VDD_CPU_GPU_CV 3100mW/2775mW VDD_SOC 3304mW/3062mW
06-27-2023 11:32:12 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.781C CPU@97.937C SOC2@95.218C SOC0@93.75C CV1@94.125C GPU@95.281C tj@97.937C SOC1@96.281C CV2@95.281C VDD_IN 11164mW/10259mW VDD_CPU_GPU_CV 3139mW/2776mW VDD_SOC 3383mW/3062mW
06-27-2023 11:32:13 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.781C CPU@97.937C SOC2@95.312C SOC0@94.062C CV1@94.125C GPU@95.156C tj@97.937C SOC1@96.312C CV2@95.187C VDD_IN 11204mW/10260mW VDD_CPU_GPU_CV 3139mW/2776mW VDD_SOC 3343mW/3063mW
06-27-2023 11:32:14 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.812C CPU@97.875C SOC2@95.468C SOC0@93.968C CV1@94.156C GPU@95.062C tj@97.875C SOC1@96.25C CV2@95.25C VDD_IN 11204mW/10260mW VDD_CPU_GPU_CV 3139mW/2776mW VDD_SOC 3343mW/3063mW
06-27-2023 11:32:15 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 27%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@96C CPU@98.187C SOC2@95.406C SOC0@93.937C CV1@94.343C GPU@95.062C tj@98.187C SOC1@96.406C CV2@95.187C VDD_IN 11243mW/10261mW VDD_CPU_GPU_CV 3139mW/2776mW VDD_SOC 3383mW/3063mW
06-27-2023 11:32:16 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.968C CPU@98.031C SOC2@95.343C SOC0@93.968C CV1@94.218C GPU@95.25C tj@98.031C SOC1@96.375C CV2@95.343C VDD_IN 11204mW/10262mW VDD_CPU_GPU_CV 3139mW/2777mW VDD_SOC 3343mW/3063mW
06-27-2023 11:32:17 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 27%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@96C CPU@98C SOC2@95.343C SOC0@93.968C CV1@94.25C GPU@95.187C tj@98C SOC1@96.406C CV2@95.312C VDD_IN 11243mW/10263mW VDD_CPU_GPU_CV 3139mW/2777mW VDD_SOC 3383mW/3064mW
06-27-2023 11:32:18 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 27%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.937C CPU@98C SOC2@95.312C SOC0@93.875C CV1@94.25C GPU@95.031C tj@98C SOC1@96.343C CV2@95.312C VDD_IN 11204mW/10263mW VDD_CPU_GPU_CV 3139mW/2777mW VDD_SOC 3383mW/3064mW
06-27-2023 11:32:19 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 25%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.968C CPU@98.187C SOC2@95.375C SOC0@94.062C CV1@94.25C GPU@95.062C tj@98C SOC1@96.375C CV2@95.312C VDD_IN 11124mW/10264mW VDD_CPU_GPU_CV 3139mW/2778mW VDD_SOC 3304mW/3064mW
06-27-2023 11:32:20 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 27%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.843C CPU@98C SOC2@95.437C SOC0@93.906C CV1@94.375C GPU@95.218C tj@98C SOC1@96.562C CV2@95.281C VDD_IN 11204mW/10265mW VDD_CPU_GPU_CV 3139mW/2778mW VDD_SOC 3343mW/3064mW
06-27-2023 11:32:21 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1483,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@96.062C CPU@98.218C SOC2@95.5C SOC0@94.093C CV1@94.343C GPU@95.125C tj@98.218C SOC1@96.5C CV2@95.281C VDD_IN 11124mW/10266mW VDD_CPU_GPU_CV 3100mW/2778mW VDD_SOC 3343mW/3065mW
06-27-2023 11:32:22 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 24%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@96.125C CPU@97.968C SOC2@95.406C SOC0@94.031C CV1@94.406C GPU@95.031C tj@97.968C SOC1@96.437C CV2@95.218C VDD_IN 11084mW/10266mW VDD_CPU_GPU_CV 3100mW/2778mW VDD_SOC 3264mW/3065mW
06-27-2023 11:32:23 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 25%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.937C CPU@97.906C SOC2@95.406C SOC0@93.968C CV1@94.312C GPU@95.5C tj@97.906C SOC1@96.406C CV2@95.468C VDD_IN 11044mW/10267mW VDD_CPU_GPU_CV 3100mW/2779mW VDD_SOC 3304mW/3065mW
06-27-2023 11:32:24 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 26%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.937C CPU@98C SOC2@95.406C SOC0@94C CV1@94.312C GPU@95.125C tj@98C SOC1@96.406C CV2@95.312C VDD_IN 11004mW/10267mW VDD_CPU_GPU_CV 3100mW/2779mW VDD_SOC 3343mW/3065mW
06-27-2023 11:32:25 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 24%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.906C CPU@98C SOC2@95.437C SOC0@94.218C CV1@94.218C GPU@95.343C tj@98C SOC1@96.375C CV2@95.343C VDD_IN 11044mW/10268mW VDD_CPU_GPU_CV 3100mW/2779mW VDD_SOC 3264mW/3065mW
06-27-2023 11:32:26 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 24%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.906C CPU@97.843C SOC2@95.531C SOC0@94.125C CV1@94.25C GPU@95.281C tj@97.843C SOC1@96.406C CV2@95.406C VDD_IN 11004mW/10269mW VDD_CPU_GPU_CV 3100mW/2779mW VDD_SOC 3264mW/3065mW
06-27-2023 11:32:27 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 25%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@95.843C CPU@98.281C SOC2@95.406C SOC0@94.125C CV1@94.406C GPU@95.125C tj@98.281C SOC1@96.625C CV2@95.312C VDD_IN 11044mW/10269mW VDD_CPU_GPU_CV 3100mW/2780mW VDD_SOC 3343mW/3066mW
06-27-2023 11:32:28 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 24%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@96.062C CPU@98.031C SOC2@95.562C SOC0@94.093C CV1@94.406C GPU@95.125C tj@98.031C SOC1@96.562C CV2@95.281C VDD_IN 11084mW/10270mW VDD_CPU_GPU_CV 3100mW/2780mW VDD_SOC 3304mW/3066mW
06-27-2023 11:32:29 RAM 12619/15430MB (lfb 673x4MB) CPU [100%@1651,100%@1651,100%@1651,100%@1651,off,off,off,off] EMC_FREQ 25%@2133 GR3D_FREQ 0%@305 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@96.062C CPU@97.906C SOC2@95.437C SOC0@94.062C CV1@94.406C GPU@95.25C tj@97.906C SOC1@96.5C CV2@95.468C VDD_IN 11044mW/10270mW VDD_CPU_GPU_CV 3100mW/2780mW VDD_SOC 3264mW/3066mW

Removing the capacitor doesn’t seem to change the behavior, the module is still shutting down.

Got it. So,

  1. A small heat sink at the upper left can help.
  2. Removing C679 won’t help.

Can you try removing R377 as below? Removing it will cut off the thermal_shutdown signal, let’s check if shutdown_req is asserted by internal thermal sensor or by failure component. Be careful not to let SoC temperature overceed threshold.

The module is still pulling SHUTDOWN_REQ after removing R377.

So removing C679 and R377 (0ohm) won’t help. Can you please try removing R329 (0ohm) as below?
r329

Ok, should I install R377 again?

No, just leave it unconnected.