We encountered problems when using Orin to collect image data for a long time. About a few minutes to 2 hours after 12 videos were lit, there was always a Camera with 3 VC channels suddenly stopped drawing,and there are 3 cores that are 100% occupied. Here are the kernel printouts and csi traces,Please help to see what the specific problem is, thank you.
kernel0919.log (135.3 KB)
csi_trace.log (86.4 MB)
hello 541449841,
it looks like an error has occurred to terminate the stream.
for example,
[ 2036.697688] [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=2061400933312 sof_ts=2061405441600 gerror_code=2 gerror_data=400064 notify_bits=0"
[ 2036.697696] [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=2061403499936 sof_ts=2061408002656 gerror_code=2 gerror_data=400064 notify_bits=0"
[ 2039.269640] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[ 2039.269642] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
BTW,
is it possible to test multi-cam use-case by running argus_camera with Multi Session mode?
Could you please give me some use cases or documentation about argus_camera with Multi Session mode,thank you.
hello 541449841,
here’s developer guide, Libargus Camera API.
you may also download the MMAPI package, $ sudo apt install nvidia-l4t-jetson-multimedia-api
please see-also /usr/src/jetson_multimedia_api/argus/README.TXT
this Error can recovery by VI recovery functions.
[ 2036.697688] [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=2061400933312 sof_ts=2061405441600 gerror_code=2 gerror_data=400064 notify_bits=0"
[ 2036.697696] [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=2061403499936 sof_ts=2061408002656 gerror_code=2 gerror_data=400064 notify_bits=0"
[ 2039.269640] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[ 2039.269642] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
the fatal error,it happened at [3706.453063]
[ 3706.453063] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 44475, flags: 0, err_data 4194402
[ 3706.458153] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 44475, flags: 0, err_data 131072
[ 3706.463894] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 44475, flags: 0, err_data 131072
[ 3706.486358] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 44476, flags: 0, err_data 131072
[ 3706.496974] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 44476, flags: 0, err_data 131072
[ 3706.497028] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 44476, flags: 0, err_data 131072
[ 3706.519637] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 44477, flags: 0, err_data 131072
[ 3706.524712] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 44477, flags: 0, err_data 131072
[ 3706.530221] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 44477, flags: 0, err_data 131072
[ 3706.552942] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 44478, flags: 0, err_data 131072
[ 3706.558011] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 44478, flags: 0, err_data 131072
[ 3706.563525] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 44478, flags: 0, err_data 131072
hello 541449841,
discarding frame logs it’s sometime a warning messages, due to unsuccess capture state, it’s dropping frames and issue a requeue for asking new buffers.
per your logs, it seems those frame index has keep increasing, it’s channel encountered uncorrectable error and must be reset.
are you using serdes chip to setup total 12 camera devices?
is there intermittent MIPI signaling? could you please also confirm the hardware connections.
thanks
NO,Four physical devices named CameraA,CameraB,CameraC adn CameraD.
CameraA (vc0 , vc1, vc2) —>port serial_a (csi_port: 0 ) —>Orin。it is similar to CameraB,CameraC adn CameraD.
We’ve probed the MIPI signal with an oscilloscope and it’s been stable
Hi,JerryChang
We found that this phenomenon seems to be related to heat dissipation, when the module CPU temperature is about 60° it is easy to repeat the problem of discarding frame, and when themodule CPU temperature is 40+° it almost does not appear above the problem, is this reasonable?
hello 541449841,
it may due to the system is thermally throttled. please see-also Software Clock Throttling section for more details.
you may also enable Tegrastats Utility to monitor the processor usage for double confirmation.
It looks like doesn’t seem to have anything to do with thermally throttled
num_csi_lanes = <12>;
max_lane_speed = <3500000>;
min_bits_per_pixel = <10>;
vi_peak_byte_per_pixel = <2>;
vi_bw_margin_pct = <25>;
max_pixel_rate = <2496000>;
isp_peak_byte_per_pixel = <5>;
isp_bw_margin_pct = <25>;
Will this happen if I set max_lane_speed to 9000000? If I change it to 3500000, it seems that there are no problems before;Are there any other considerations in the dtsi device tree?
for CPHY, it is sps or bit/s?
hello 541449841,
please note that, the settings within tegra-camera-platform
are the numbers for all camera running scenario.
max_lane_speed
is the settings of max lane speed in Kbit/s.
After more testing yesterday, it seems that the setting of this parameter is not critical.
We found that when the CPU core temperature rose to 65°, the VI module would be abnormal, and the CPU usage would increase to 100%. Using CTRL+C to close the outflow would cause Orin to crash. I think this is a system-level BUG, and I have also seen many similar cases, is there an effective solution to this problem?
Test condition:
AGX Orin 32G module
R35.4.1
Camera pipeline
1 CMOS sensor with 4 Virtual channel;
vc0 : 4096x3072@30fps — stream 0-csiport0-vc0 —Orin
vc1 : 4096x3072@30fps — stream 0-csiport0-vc1 —Orin
vc2 : 4096x3072@30fps — stream 0-csiport0-vc2 —Orin
vc3 : 4096x384@30fps — stream 0-csiport0-vc3 —Orin
please Note:
4 CMOS sensor ,1 Virtual channel per sensor,it does not cause problems with 100%CPU usage and Orin Crash,the CPU core temperature rises to 70+°, and the image data can be collected normally.The above problems only occur in the virtual channel usage scenario.
hello 541449841,
can you confirm this is thermal related? can you please adding fan to cool the target to confirm the status.
We’re running more tests to confirm.we did the test with and without fans both.
Hi, Jerry
Let’s test it out, and the BUG isn’t about absolute values of temperature. From the perspective of the phenomenon, the rise from a lower temperature to a higher temperature will trigger at 65C, and the fall from a higher temperature (such as 70C) to a lower temperature will trigger at about 62°.
However, tegrastatus shows that no module’s operating frequency or Load has significantly changed, but the CPU usage is 100%. In addition, tegrastatus shows SW THrottling limit is 99C(CV0,CPU,SOC1…) SW Shutdown Limit is 104.5C ;
Are there any other special treatments that take place around 65C?
hello 541449841,
I’m also concerned about this…
may I confirm the power mode you’re using? please check nvpmodel GUI as see-also.
Power mode is MAXN
In the normal state, a total of eight cpus, each occupy less than 10%, only when the BUG occurs, all CPU usage will increase from about 10% to 100%
test-video-smartsens.zip (16.8 MB)
This is the video I recorded during the test, the anomaly happened in the last few seconds of the video, please take a look