DRAM Cooling Jetson Nano

We are finding excursions of the PLL thermal limit of 70degC when operating our Jetson Nano with passive cooling. This results in various Dmesg error logs, however, we are unsure if these excursions result in any performance throttling with the device ei. frequency scaling. We checked the “current EMC frequency” of the device before and after 70degC with cat /sys/kernel/debug/tegra_bwmgr/emc_rate however there was no change - We were originally assuming that the “DRAM Cooling” given in this table TABLE meant the EMC frequency was throttled when the thermal limit was met, but this does not seem to be the case.

Does anyone know the definition of “DRAM Cooling” in regards to the technical documentation provided by Nvidia?

hello Detlev ,

could you please also enable tegrastats utility to monitor the reported temperature.
thanks

1 Like

Below is the Tegrastat cmd printed at 30min intervals

RAM 3167/3964MB (lfb 18x4MB) SWAP 636/1982MB (cached 26MB) IRAM 0/252kB(lfb 252kB) CPU [21%@1479,61%@1479,42%@1479,36%@1479] EMC_FREQ 26%@1600 GR3D_FREQ 70%@921 VIC_FREQ 0%@192 APE 25 PLL@69.5C CPU@76C iwlwifi@68C PMIC@100C GPU@74.5C AO@75.5C thermal@75C POM_5V_IN 7816/7816 POM_5V_GPU 3129/3129 POM_5V_CPU 1864/1864
RAM 3168/3964MB (lfb 18x4MB) SWAP 635/1982MB (cached 26MB) IRAM 0/252kB(lfb 252kB) CPU [65%@1479,56%@1479,44%@1479,52%@1479] EMC_FREQ 23%@1600 GR3D_FREQ 59%@921 VIC_FREQ 0%@192 APE 25 PLL@70.5C CPU@77C iwlwifi@68C PMIC@100C GPU@75C AO@77C thermal@75.75C POM_5V_IN 7737/7737 POM_5V_GPU 3129/3129 POM_5V_CPU 1944/1944
RAM 3171/3964MB (lfb 18x4MB) SWAP 634/1982MB (cached 27MB) IRAM 0/252kB(lfb 252kB) CPU [74%@1479,60%@1479,50%@1479,36%@1479] EMC_FREQ 25%@1600 GR3D_FREQ 99%@921 VIC_FREQ 0%@192 APE 25 PLL@70C CPU@76C iwlwifi@69C PMIC@100C GPU@74.5C AO@76C thermal@75C POM_5V_IN 8729/8729 POM_5V_GPU 4199/4199 POM_5V_CPU 1822/1822
RAM 3175/3964MB (lfb 18x4MB) SWAP 633/1982MB (cached 27MB) IRAM 0/252kB(lfb 252kB) CPU [98%@1479,98%@1479,96%@1479,99%@1479] EMC_FREQ 22%@1600 GR3D_FREQ 46%@921 VIC_FREQ 0%@192 APE 25 PLL@72C CPU@78.5C iwlwifi@69C PMIC@100C GPU@75C AO@77.5C thermal@76.5C POM_5V_IN 8769/8769 POM_5V_GPU 2733/2733 POM_5V_CPU 3486/3486
RAM 3162/3964MB (lfb 18x4MB) SWAP 636/1982MB (cached 27MB) IRAM 0/252kB(lfb 252kB) CPU [78%@1479,72%@1479,66%@1479,69%@1479] EMC_FREQ 23%@1600 GR3D_FREQ 99%@921 VIC_FREQ 0%@192 APE 25 PLL@69.5C CPU@76C iwlwifi@68C PMIC@100C GPU@73C AO@75.5C thermal@73.75C POM_5V_IN 9864/9864 POM_5V_GPU 3921/3921 POM_5V_CPU 3327/3327
RAM 3162/3964MB (lfb 18x4MB) SWAP 636/1982MB (cached 27MB) IRAM 0/252kB(lfb 252kB) CPU [96%@1479,96%@1479,97%@1479,98%@1479] EMC_FREQ 21%@1600 GR3D_FREQ 1%@921 VIC_FREQ 0%@192 APE 25 PLL@70.5C CPU@76.5C iwlwifi@67C PMIC@100C GPU@73.5C AO@76C thermal@73.75C POM_5V_IN 8015/8015 POM_5V_GPU 2261/2261 POM_5V_CPU 3406/3406
RAM 3159/3964MB (lfb 18x4MB) SWAP 637/1982MB (cached 27MB) IRAM 0/252kB(lfb 252kB) CPU [77%@1479,54%@1479,56%@1479,65%@1479] EMC_FREQ 25%@1600 GR3D_FREQ 99%@921 VIC_FREQ 0%@192 APE 25 PLL@67C CPU@72.5C iwlwifi@67C PMIC@100C GPU@71.5C AO@73C thermal@72.5C POM_5V_IN 7896/7896 POM_5V_GPU 3372/3372 POM_5V_CPU 1825/1825
RAM 3161/3964MB (lfb 18x4MB) SWAP 637/1982MB (cached 27MB) IRAM 0/252kB(lfb 252kB) CPU [98%@1479,92%@1479,93%@1479,96%@1479] EMC_FREQ 22%@1600 GR3D_FREQ 91%@921 VIC_FREQ 0%@192 APE 25 PLL@69.5C CPU@75C iwlwifi@67C PMIC@100C GPU@72C AO@75.5C thermal@73.5C POM_5V_IN 8293/8293 POM_5V_GPU 2535/2535 POM_5V_CPU 3288/3288

hello Detlev ,

those logs seem it’s worked.
when PLL temperature is > 70C, DRM cooling device is expected to be activated and start pooling DDR thermal state to decide EMC refresh rate based on temperature.
please also check below for confirmation.
for example,

# cat /sys/class/thermal/cooling_device12/type
tegra-dram

# cat /sys/class/thermal/cooling_device12/cur_state

Sorry I don’t follow. The Tegrastats readout above shows EMC frequency consistently at 1600, even when above 70DegC. Am I missing something?

hello Detlev ,

I meant, it looks PLL temperature is dropping when it above 70C.
could you please dig into cur_state to confirm the DRM cooling device is actually activate or not.
thanks

Hi Jerry,

Yes, I can confirm that the “tegra-dram” cooling device triggers (1) when above 71 Deg C.

But I still don’t understand the cooling mechanism. Namely, how the DRAM cooling influences temperature, and if those actions influence the performance of the product? For example, does it throttle frequency on a specific circuit or does it reduce the voltage?

@JerryChang?

hello Detlev,

it’s passive cooling; when PLL temperature crossed trip point temperature, then dram cooling device reads DDR Die-0 & Die-1 temperature state to decide self-refresh rate & EMC DVFS table.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.