Please provide complete information as applicable to your setup.
**• Hardware Platform (Jetson / GPU)**AGX Xavier
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only) 4.4
What’s the thermal limit for running apps in AGX Xavier?
I am running the same 4 pipelines to undistort a video using deepstream plugin using cuda opencv. One test run in “30W ALL” mode with fan set at 255. Everything seems normal except frame rate is less than expected. Then I change to “MAXN” mode, fan still set at 255, after couple minutes running, the system crash due to:
[ 512.088367] nvgpu: 17000000.gv11b gk20a_channel_timeout_handler:1570 [ERR] Job on channel 509 timed out
[ 512.089276] nvgpu: 17000000.gv11b nvgpu_set_error_notifier_locked:137 [ERR] error notifier set to 8 for ch 509
Then I check tegrastats log, at the moment of above error, the GPU temperature is around 45C:
RAM 13080/31919MB (lfb 4158x4MB) SWAP 0/15959MB (cached 0MB) CPU [99%@2265,38%@2265,36%@2265,38%@2265,43%@2265,47%@2265,36%\
@2265,38%@2265] EMC_FREQ 0% GR3D_FREQ 97% AO@40.5C GPU@45C Tdiode@44.25C PMIC@100C AUX@39C CPU@43C thermal@41.7C Tboard@39C\
GPU 13624/8571 CPU 4592/3326 SOC 6733/4620 CV 0/0 VDDRQ 2145/1429 SYS5V 3451/3016
What is the highest temperature that GPU can run at without crash? I wonder if 45C is the thermal limit of GPU? if so, then what does the thermal spec of AGX Xavier -25C to 80C mean?
Attached are serial console log after_reflash_4_undistort_crash.log (33.7 KB) and tegrastats log after_reflash_4_undistort_crash_tegrastats.log (260.7 KB) when nvgpu_set_error_notifier_locked happened.