We have a customized fanless Orin Nx 16G product.
When I use stress-ng and gpu-burn(GitHub - wilicc/gpu-burn: Multi-GPU CUDA stress test) to do the stress test.
Since it’s a fanless product, the frequency of CPU will be decreased when the temperature is around 99 degress.
one device is abnormal after one day, below are the status of device
1.The mouse is working, both of stress-ng and gpu-burn are running.
2.The icon of Apps can’t display correctly
3.All the GUI app can’t be launch by mouse double click.
I power off and power on the device, and get the kern.log and syslog from /var/log
I compare the logs of abnormal device with normal device, but I can’t find the difference.
Please help to analyze, thanks.
Test Environment
1.Orin NX 16G
2.L4T R35.4.1
3.App : stress-ng and gpu-burn
4.Customized Orin Nx carrier board and fanless mechanism.
5.office,around 25~30 Celsius degress)
6.Power level is base on 25W, and change the GPU MAX from 408000000 to 535500000
The logs.zip includes
1_kern_log_fail.txt : kernel log of abnormal device
2_syslog_fail.txt : system log of abnormal device
3_kern_log_pass.txt : kernal log of another normal device after one day test
4_syslog_pass.txt : system log of another normal device after one day test
5_icon_incorrect.jpg : screen shot of abnormal icon display
6_stress_test.jpg : screen shot of stress-ng and gpu-burn test
logs.zip (21.9 MB)