Camera random crash using jetson multimedia api sample

Hello,

I’m currently running 4 mipi cameras on Jetpack 4.6.2(L4T 32.7.2) and found segmentation fault every a few days.
I found the below logs in syslog related to Argus SDK, which are

Feb 19 21:26:10 caper-desktop nvargus-daemon[15109]: (Argus) Error FileOperationFailed: Failed socket read: Connection reset by peer (in src/rpc/socket/common/SocketUtils.cpp, function readSocket(), line 79)
Feb 19 21:26:10 caper-desktop nvargus-daemon[15109]: (Argus) Error FileOperationFailed:  (propagating from libs/rpc_socket_server/ServerSocketManager.cpp, function recvThreadCore(), line 185)
Feb 19 21:26:10 caper-desktop nvargus-daemon[15109]: (Argus) Error FileOperationFailed:  (propagating from libs/rpc_socket_server/ServerSocketManager.cpp, function run(), line 58)

How can I avoid this segmentation fault during running?

Best

hello xzry,

it’s error reported from system level regarding to socket operations.

I would like to have more details for digging into this.
may I know what’s your test pipeline? for example, did you have continuous file writing to the disk?
is it possible to recover the camera functionality by restarting camera service?
i.e. $ sudo pkill nvargus-daemon and then, $ sudo systemctl start nvargus-daemon

Hi Jerry,

Thanks for the response.

We don’t write file to the disk. Basically we refer to jetson_multimedia_api/samples/13_multi_camera and add some deep learning models to process the frames. This issue happens twice in two days, it seems pretty random to me.

sudo systemctl restart nvargus-daemon do recover the camera. But it’s better for our use case to avoid it from happening.
Let me know if anything else I can provide to help analyze.

hello xzry,

since you’ve added some deep learning models to process the frames, which has increased the system loading.
is it possible to rule out the possibility of camera errors? had you try testing camera preview free run to ensure it’s long run stability?

Hi Jerry,

There was no noticeable CPU or memory increment when the crash happened, if this information is useful. The CPU utilization stays at 60%, memory stays at 50%. The program can run without any problem for a long time, but at a sudden it may experience the above issue and crashes. From my point of view, it’s not likely related to the deep learning models.

I’m running the sample jetson_multimedia_api/samples/13_multi_camera alone without deep learning model, but it may take some time to reproduce regarding the randomness.

Is there any suggestion on what else logs I can grab when it happens, or any action I can take now?

hello xzry,

BTW,
please have below commands executed to boost all the VI/CSI/ISP clocks for testing.

sudo su
echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate |tee /sys/kernel/debug/bpmp/debug/clk/vi/rate
cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/isp/rate
cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate
cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/emc/rate

we may check the error in the beginning,
you could collect nvargus-daemon logs after the issue happened, i.e. $ sudo journalctl -b -u nvargus-daemon

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.