JP5.1 nvarguscamera doesn't recover from single NVCSI failure

hello pepijn.vanheiningen,

thanks for sharing test results.

may I know how you enable camera stream,
could you please try with Argus example, i.e. Argus/public/samples/userAutoExposure

Still the same as before:

gst-launch-1.0 nvarguscamerasrc ee-mode=0 tnr-mode=0 aeantibanding=0 silent=false ! fakesink

I have tried the userAutoExposure, but it stops working when the sensors are reset as well.

hello pepijn.vanheiningen,

please see-also forum topic, How to make Argus in Jetson 35.2.1 recover after a corrupted frame?

it looks error handling mechanism did not works, we do reproduce the issue locally.
test environment… l4t-r35.3.1 + AGX Orin + IMX274.

let us check this issue internally.

Hi,

We hit the exact same issue. Any updates on this?

hello casperlyngesen.mogensen,

here’s pre-built update, Topic243051_Jun05.7z (1.8 MB)
could you please based-on JetPack-5.1.1/l4t-r35.3.1 to update the binaries with attachment,
you may preform a warm-reboot, i.e. $ sudo reboot after replace the binary files.

Hi

Thank you very much, I will test it out this week and report back with results

Best Regards

Casper Mogensen

Hi again

I have tested a few days now. There is an improvement in the general handling of bad frames, but occasionally there is a Segmentation Fault i nvargus-daemon, which my gstreamer pipeline (Python + OpenCV) sometimes hangs on.

Anything you would like from me regarding logs for the seg fault?

We will start some longterm testing in the coming days

Best Regards
Casper

hello casperlyngesen.mogensen,

would you please narrow down the issue, you may exclude opencv for testing,
for example, can you reproduce the same by simply running gst pipeline to launch camera preview frames?

Sure, would you like logs from Argus?

It can take hours to hit the seg fault, but will get back, when it happens again

yes, please share Argus daemon log, and also kernel logs for reference.

Hi

I actually had my fault from earlier today in the systemd logs, attached the log from nvargus-daemon. I do not have the kernel log from that time, but the only relevant entry I see, is this:

[11536.474315] [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] “General error queue is out of sync with frame queue. ts=11558435347200 sof_ts=11558435754816 gerror_code=2 gerror_data=a2 notify_bits=0”

Will try to recreate without OpenCv, but do not know when that will happen

argus.log (58.2 KB)

BTW,
how often did you see segmentation fault, what’s the failure rate?
it’s tested locally with the steps as mentioned in comment #28, I don’t see such errors from my side.

It happens after the camera has been re-initialised some times. If you wrap that command in a while true on the command-line, that mimics how I run it.

while [ true ]; do gst-launch-1.0 nvarguscamerasrc sensor-id=6 ! 'video/x-raw(memory:NVMM),framerate=30/1,format=NV12' ! nvvidconv ! xvimagesink; done

I had a crash again this night after some hours
argus.log (10.6 KB)

hello casperlyngesen.mogensen,

according to the logs, there’s timeout and software stack handling this error.
it needs 5 seconds (or more) for internal process to recover the state.

may I know what’s the exactly failure?
for example,
is there an intermittent signaling on sensor side?
or… you’ve disconnect/connect the camera device physically?

let me double confirm what’s happening after segmentation fault.
(1) are you able to interrupt the process to re-run the gst pipeline?
(2) is it possible to recover by restarting Argus daemon?
$ sudo pkill nvargus-daemon
$ sudo systemctl start nvargus-daemon

Hi

Yes, i can recover just fine. Do not need to restart nvargus-daemon, just need to restart my application

And it is intermittent signaling. Camera disconnect fails completely with Python and OpenCV (Not argus related)

hello casperlyngesen.mogensen,

thanks for clarification, it looks error handling mechanism is working as expected.

Yes, i agree. It is only the segmentation fault, that causes problems now

let me have clarification,
since there’s intermittent signaling, it shows timeout failures from camera pipeline. Argus will report it via EVENT_TYPE_ERROR, and the application has to shutdown.
the segmentation fault is expected to force-stop the application, due to you’ve also confirm the camera functionality after restart the application, the Argus error handling mechanism is functional.