Abnormal display during stress test

We have a customized fanless Orin Nx 16G product.
When I use stress-ng and gpu-burn(GitHub - wilicc/gpu-burn: Multi-GPU CUDA stress test) to do the stress test.
Since it’s a fanless product, the frequency of CPU will be decreased when the temperature is around 99 degress.

one device is abnormal after one day, below are the status of device
1.The mouse is working, both of stress-ng and gpu-burn are running.
2.The icon of Apps can’t display correctly
3.All the GUI app can’t be launch by mouse double click.

I power off and power on the device, and get the kern.log and syslog from /var/log
I compare the logs of abnormal device with normal device, but I can’t find the difference.
Please help to analyze, thanks.

Test Environment
1.Orin NX 16G
2.L4T R35.4.1
3.App : stress-ng and gpu-burn
4.Customized Orin Nx carrier board and fanless mechanism.
5.office,around 25~30 Celsius degress)
6.Power level is base on 25W, and change the GPU MAX from 408000000 to 535500000

The logs.zip includes
1_kern_log_fail.txt : kernel log of abnormal device
2_syslog_fail.txt : system log of abnormal device
3_kern_log_pass.txt : kernal log of another normal device after one day test
4_syslog_pass.txt : system log of another normal device after one day test
5_icon_incorrect.jpg : screen shot of abnormal icon display
6_stress_test.jpg : screen shot of stress-ng and gpu-burn test

logs.zip (21.9 MB)

Please try to reproduce this issue on NV devkit. Checked if this is due to overheat or high cpu loading.

If this is due to overheat, then please try the thermal solution as NV devkit.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.