I am encountering a peculiar issue during high-temperature testing on an Orin NX module using JP5.1.1. We are using a custom carrier board, and unexpectedly, GMSL cameras are experiencing freezes during this process. This behavior has not been observed at normal temperatures.
I have attached the dmesg and trace logs for your reference. Your assistance in analyzing and resolving this issue would be highly appreciated.
No, I think I may not have conveyed my point clearly. What I truly meant is that even after executing the command to increase the clock, the above-mentioned error still occurs, and it seems that there is no improvement. I wanted to inform you that the clock frequency may not be one of the root causes of this issue.
Best regards.
Hi, ShaneCCC
Yes, I have run up to eight cameras simultaneously and displayed their outputs. Testing a single camera individually doesn’t provide significant insights for us, regardless of whether it has issues or not, as our application scenario involves multiple cameras.
Best regards.
Hi, ShaneCCC
I have identified CRC checksum errors in the trance log and found the following issues in dmesg. Perhaps you can guide me on how to disable CRC-related features and proceed with further validation.
Hi, ShaneCCC
As I mentioned earlier, the camera runs continuously without issues at normal temperatures (20-30 degrees Celsius). However, during high-temperature testing (inside a temperature chamber set to 70 degrees), it encounters errors during runtime.
After lowering the temperature, it resumes operation. I am uncertain whether this error is related to the Orin NX module or the serializer chip outputting MIPI signals. Therefore, I seek your assistance in analyzing the relevant errors in the trace log.
Hi, ShaneCCC
As I mentioned earlier, the camera runs continuously without issues at normal temperatures (20-30 degrees Celsius). However, during high-temperature testing (inside a temperature chamber set to 70 degrees), it encounters errors during runtime.
After lowering the temperature, it resumes operation. I am uncertain whether this error is related to the Orin NX module or the serializer chip outputting MIPI signals. Therefore, I seek your assistance in analyzing the relevant errors in the trace log.
Hi, ShaneCCC:
Investigation on the Impact of GMSL chip Temperature Rise on Its Operation:
Placing the device outside the environmental chamber and heating only the GMSL Deser chip and its surrounding components, the measured surface temperature of the chip reaches above 110℃. The camera operates normally without any freezing.
Exposing the bottom of the PCBA to room temperature air and heating the module (PCBA + top shell heat dissipation) with a heating platform. When the module temperature approaches 99℃, there is a phenomenon of frequency reduction, and after a period, cameras video4~7 freeze.
Based on the above test results, it is preliminarily believed that the camera freezing is not caused by the high temperature of MAX96712.
Hi, ShaneCCC
Yes, our main focus is on heating the Orin module. According to the Power GUI, when the CPU reaches over ninety degrees, it starts to throttle, and at the same time, there is an issue of cameras video4-7 freezing.
Hi, ShaneCCC
I acknowledge what you’re saying, but I would prefer to know how to address this issue. We’ve identified that the problem may be caused by CPU throttling leading to data errors.
However, from the trace log, I found issues related to CRC check failures. If the driver were killed directly, I could accept that. However, it seems that the driver is still functioning normally, but the end result is incorrect.
As a technical developer, I believe it’s essential for us to identify the root cause of issues that may arise in certain environments rather than simply avoiding those environments. Understanding the underlying reasons for problems not only facilitates your efforts in addressing and resolving them but also enables us to provide more informed responses to customer inquiries.
While it might be possible to mitigate the impact of CPU throttling by improving the device’s thermal management, it’s challenging to comprehend how a minor malfunction on the system level justifies a comprehensive overhaul of the entire thermal solution. It seems somewhat impractical.
I suggest focusing on pinpointing the specific conditions or triggers that lead to CPU throttling in such environments. This approach can contribute to a more targeted and effective solution, benefiting both your team’s troubleshooting efforts and our ability to address similar concerns from our customers.