Unusual GMSL Camera Freezing Issue during High-Temperature Testing on Custom Carrier Board

Hello NVIDIA community,

I am encountering a peculiar issue during high-temperature testing on an Orin NX module using JP5.1.1. We are using a custom carrier board, and unexpectedly, GMSL cameras are experiencing freezes during this process. This behavior has not been observed at normal temperatures.

I have attached the dmesg and trace logs for your reference. Your assistance in analyzing and resolving this issue would be highly appreciated.

Thank you for your time and support.
dmesgnew.log (133.9 KB)
trace.log (9.9 MB)

Best regards,

BTW,
We have success use these command:

sudo su
echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate |tee /sys/kernel/debug/bpmp/debug/clk/vi/rate
cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/isp/rate
cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate
cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/emc/rate
1 Like

Do you mean the problem fixed by boost the clocks?

Hi, ShaneCCC

No, I think I may not have conveyed my point clearly. What I truly meant is that even after executing the command to increase the clock, the above-mentioned error still occurs, and it seems that there is no improvement. I wanted to inform you that the clock frequency may not be one of the root causes of this issue.
Best regards.

Did you run multiple cameras? How many cameras? How about single camera case?

Hi, ShaneCCC
Yes, I have run up to eight cameras simultaneously and displayed their outputs. Testing a single camera individually doesn’t provide significant insights for us, regardless of whether it has issues or not, as our application scenario involves multiple cameras.
Best regards.

Hi, ShaneCCC
I have identified CRC checksum errors in the trance log and found the following issues in dmesg. Perhaps you can guide me on how to disable CRC-related features and proceed with further validation.

Best regards,

Current don’t support to disable CRC checking.
Suggest check the sensor signal or configure.

Hi, ShaneCCC
As I mentioned earlier, the camera runs continuously without issues at normal temperatures (20-30 degrees Celsius). However, during high-temperature testing (inside a temperature chamber set to 70 degrees), it encounters errors during runtime.

After lowering the temperature, it resumes operation. I am uncertain whether this error is related to the Orin NX module or the serializer chip outputting MIPI signals. Therefore, I seek your assistance in analyzing the relevant errors in the trace log.

Best regards.

Hi, ShaneCCC
As I mentioned earlier, the camera runs continuously without issues at normal temperatures (20-30 degrees Celsius). However, during high-temperature testing (inside a temperature chamber set to 70 degrees), it encounters errors during runtime.

After lowering the temperature, it resumes operation. I am uncertain whether this error is related to the Orin NX module or the serializer chip outputting MIPI signals. Therefore, I seek your assistance in analyzing the relevant errors in the trace log.

Best regards.

Can just heat up the sensors and GMSL chip only?

Hi, ShaneCCC:
Investigation on the Impact of GMSL chip Temperature Rise on Its Operation:

  1. Placing the device outside the environmental chamber and heating only the GMSL Deser chip and its surrounding components, the measured surface temperature of the chip reaches above 110℃. The camera operates normally without any freezing.
  2. Exposing the bottom of the PCBA to room temperature air and heating the module (PCBA + top shell heat dissipation) with a heating platform. When the module temperature approaches 99℃, there is a phenomenon of frequency reduction, and after a period, cameras video4~7 freeze.

Based on the above test results, it is preliminarily believed that the camera freezing is not caused by the high temperature of MAX96712.

Orin NX Modules: video4-7 is CSI2~CSI3

Best regards.

Does 2 experiment heat up the CPU/GPU?

Hi, ShaneCCC
Yes, our main focus is on heating the Orin module. According to the Power GUI, when the CPU reaches over ninety degrees, it starts to throttle, and at the same time, there is an issue of cameras video4-7 freezing.

Best regards.

Suppose it’s know behavior while CPU/GPU throttle cause performance drop to handle it.

Thanks

Hi, ShaneCCC
I acknowledge what you’re saying, but I would prefer to know how to address this issue. We’ve identified that the problem may be caused by CPU throttling leading to data errors.

However, from the trace log, I found issues related to CRC check failures. If the driver were killed directly, I could accept that. However, it seems that the driver is still functioning normally, but the end result is incorrect.

Best regards.

I would suggest to improve the thermal solution to improve it.

Thanks

Hi ShaneCCC,

As a technical developer, I believe it’s essential for us to identify the root cause of issues that may arise in certain environments rather than simply avoiding those environments. Understanding the underlying reasons for problems not only facilitates your efforts in addressing and resolving them but also enables us to provide more informed responses to customer inquiries.

While it might be possible to mitigate the impact of CPU throttling by improving the device’s thermal management, it’s challenging to comprehend how a minor malfunction on the system level justifies a comprehensive overhaul of the entire thermal solution. It seems somewhat impractical.

I suggest focusing on pinpointing the specific conditions or triggers that lead to CPU throttling in such environments. This approach can contribute to a more targeted and effective solution, benefiting both your team’s troubleshooting efforts and our ability to address similar concerns from our customers.

Best regards,