温箱环境温度70℃,设备模块上电,内核日志输出断言如下:
ASSERT: bl31/ehf.c:245:pe_data->ns_pri_mask == 0u
此时无法进入系统。温度降低后一切正常。
*** Please note that this reply is generated by LLM automatically ***
The error message “Jetson AGX Orin 70 ASSERT bl31ehfc245pedatagtnsprimask 0u” appears to be related to a boot issue on the Jetson AGX Orin platform.
To provide a comprehensive answer, I’ll break down the information and provide some context.
The Jetson AGX Orin is a high-performance AI computer module developed by NVIDIA. It’s a part of the NVIDIA Jetson platform, which is designed for edge AI applications.
The error message “bl31ehfc245pedatagtnsprimask 0u” seems to be related to the boot process of the Jetson AGX Orin. The “bl31” refers to the Boot Loader stage 1, which is responsible for initializing the hardware and loading the next stage of the boot process.
The “ehfc245pedatagtnsprimask” part of the error message is likely related to a specific hardware component or configuration on the Jetson AGX Orin. Without more information, it’s difficult to determine the exact cause of the error.
However, based on the context of the question, it’s possible that the error is related to a boot failure or a hardware issue on the Jetson AGX Orin.
To troubleshoot this issue, I would recommend checking the following:
- Boot logs: Check the boot logs to see if there are any other error messages that can provide more information about the issue.
- Hardware configuration: Verify that the hardware configuration is correct and that all components are properly connected.
- Firmware version: Check the firmware version of the Jetson AGX Orin and ensure that it’s up-to-date.
- Boot process: Try to boot the Jetson AGX Orin in a different mode, such as recovery mode, to see if the issue persists.
It’s also worth noting that the Jetson AGX Orin has a known issue with PCIe link rate, which can cause boot failures. If you’re experiencing this issue, you may want to try disabling the PCIe link rate or using a different PCIe configuration.
In comparison to other Jetson modules, the Jetson AGX Orin has a similar architecture and boot process. However, the specific error message “bl31ehfc245pedatagtnsprimask 0u” is unique to the Jetson AGX Orin and may require specific troubleshooting steps.
In summary, the error message “Jetson AGX Orin 70 ASSERT bl31ehfc245pedatagtnsprimask 0u” is likely related to a boot issue on the Jetson AGX Orin platform. To troubleshoot this issue, it’s recommended to check the boot logs, hardware configuration, firmware version, and boot process. Additionally, disabling the PCIe link rate or using a different PCIe configuration may resolve the issue.
*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***
please share the full log.
These are the output logs for power-on under ambient temperature of 26°C and ambient temperature of 70°C respectively
session-assert-log.log (76.5 KB)
this is assert log
normal_log.log (96.0 KB)
this is normal log
The version of JetPack is JetPack 5.1.4.
Hi,
Is it possible to also test with NV devkit?
[0000.850] I> enable_soctherm_polling = 0
[0000.853] I> max temp read = 89
[0000.856] I> min temp read = 88
[0000.859] I> Enabling thermtrip
Also, looks not 70C but already 90C.
Testing with NV devkit yields the same result
[0000.843] I> BCT max_chip_limit = 118
[0000.846] I> BCT min_chip_limit = -43
Does the temperature in the logs refer to the TTP temperature or the junction temperature (Tj)?the default max_chip_limit value is 118℃.
After GPU power-on, thermal generation causes the ‘max temp read’ value to naturally exceed ambient temperature. The ‘session-assert-log.log’ was recorded during the second power cycle, while the first power cycle showed max temp read=84°C and triggered: ASSERT: bl31/ehf.c:245: pe_data->ns_pri_mask == 0u
不確定用中文溝通會不會比較有效率… 請問現在的實際溫度環境到底是幾度?
温箱设置的70℃。gpu模块上电后,会有发热现象,所以您看到的日志中的温度读取值为89℃。温箱70℃时,gpu第一次上电,日志中的温度值是在84℃,我上传的是第二次上电的日志
max temp read 这个温度表示结温还是壳温?结温最大阈值是118℃,89℃也远没达到这个温度
請問一下你每次碰上這錯誤的時候是否都有伴隨這三行?
on_cpu_report_start_op_fault: FMON_NAFLL_CLUSTER0_DSU: detected fault 0x20^M
cpufreq: cpufreq_online: CPU0: Running at unlisted initial frequency: 246000 KHz, changing to: 268800 KHz^M
ASSERT: bl31/ehf.c:245:pe_data->ns_pri_mask == 0u^M
是的,都有这三行,但是频率值不同
你好,请问这个问题有什么解决办法吗?
Hi,
請問一下你們每一顆module都會發生這個問題還是只有特定幾個?
另外如果溫度不跑到85以上開機會有問題嗎?
每一颗module都会发生这个问题。在常温环境下,max temp read = 55也发生这个问题
62℃-log.txt (80.9 KB)
这个是上电日志
Hi,
這整個聽起來不太正常. 能否請你真的用純粹的NV devkit 來複製問題. 連軟體都請用sdkmanager燒錄.
我知道你前面有說你用NV devkit測過. 但由於你沒附上log. 那對我們而言只是你講的一個說法而已.