Argus: Can we handle hardware timeout?

Occasionally Argus has issues and likes to die ungracefully with a SEGFAULT.

In those situations your console looks something like this:

SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceEvent.cpp, function wait(), line 59)
Error: Camera HwEvents wait, this may indicate a hardware timeout occured,abort current/incoming cc
launchCC abort cc 104 session 0
SCF: Error Timeout:  (propagating from src/api/Session.cpp, function capture(), line 830)
(Argus) Error Timeout:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)
launchCC abort cc 105 session 0
SCF: Error Timeout:  (propagating from src/api/Session.cpp, function capture(), line 830)
(Argus) Error Timeout:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)
launchCC abort cc 106 session 0
(Argus) Error InvalidState: MetadataResult callback for unknown capture. (in src/api/CaptureSessionImpl.cpp, function metadataResult(), line 705)
SCF: Error Timeout:  (propagating from src/api/Session.cpp, function capture(), line 830)
(Argus) Error Timeout:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)
launchCC abort cc 107 session 0
SCF: Error Timeout:  (propagating from src/api/Session.cpp, function capture(), line 830)
(Argus) Error Timeout:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)
launchCC abort cc 108 session 0(Argus) Error InvalidState: MetadataResult callback for unknown capture. (in src/api/CaptureSessionImpl.cpp, function metadataResult(), line 705)

SCF: Error Timeout:  (propagating from src/api/Session.cpp, function capture(), line 830)
(Argus) Error InvalidState: MetadataResult callback for unknown capture. (in src/api/CaptureSessionImpl.cpp, function metadataResult(), line 705)
(Argus) Error Timeout:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)
(Argus) Error InvalidState: MetadataResult callback for unknown capture. (in src/api/CaptureSessionImpl.cpp, function metadataResult(), line 705)
launchCC abort cc 109 session 0
SCF: Error Timeout:  (propagating from src/api/Session.cpp, function capture(), line 830)
(Argus) Error Timeout:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)
(Argus) Error InvalidState: MetadataResult callback for unknown capture. (in src/api/CaptureSessionImpl.cpp, function metadataResult(), line 705)
...
Segmentation fault (core dumped)

Is there anyway to handle this gracefully, or is it just a matter of restarting the process?

hello Atrer,

Argus application handling the camera streaming and transit it to user-space, for example, live preview.
usually, this timeout failure coming from low-level driver, it may occur by inconsistent sensor signaling.
or, there should be race condition between buffer transferring.

may I know what’s your use-case? please share the commands or step to reproduce the failures.
please also share some background details, which JetPack release you’re working with.
thanks

Hi Atrer,
To the best of my knowledge on LibArgus, I believe the ideal way to terminate an application is by following these steps in the order mentioned:

  • call stopRepeat() or cancelRequests() to stop capturing frames
  • Disable the output stream using IRequest interface
  • Disconnect the output stream using IEGLOutputStream interface
  • Delete the Request Object
  • Delete the Output Stream object
  • Delete Capture Session object
  • Delete the Camera Provider object

We are still using Jetpack 3.3, L4T 28.2.1

I will have to put together a test if I have time. But it seems like if I invoke Argus too soon after boot (using systemd) it does not start properly. My assumption is something is not initialized quite yet, be it the cameras themselves, or some other L4T stack. This is not a blocking issue at all, if we wait a bit before starting things everything works great, I was just hoping there may be some way to be able to catch that failure and restart without waiting for the segfault.

Right now we wait on network-online.target, but is there another target I should wait on before invoking Argus through systemd? I notice nvcamera-daemon.service waits for “nv.service”

hello Atrer,

we would like to reproduce the issue on our reference platforms,
could you please share the scripts or the steps to launch Argus after boot-up.
thanks

This could very well just be a repeated-start issue from not shutting down properly. Right now I am doing the following based off of what I could find in relevant samples:

  1. ICaptureSession.stopRepeat()
  2. UniqueObj.reset()
  3. UniqueObj.reset()

Going to try to recreate the issue reliably, if I can I’ll make another post, but the answer to the original question seems to be no.

My question was just if we could handle Argus crashes (to ease debugging). I’m certain now that the segfault is of my own making.

As for launching Argus too early, I think it is unrelated to my specific issue. But here is a post by Honey_Patouceul on a similar issue.
https://devtalk.nvidia.com/default/topic/1050435/jetson-tx2/kernel-panic-on-running-nvcamera-daemon-processes/post/5331411/#5331411

I have found that you can add an After=argus-daemon.service to your service file and it prevents launching Argus before the daemon is ready.

My post was mentioning that nvcamera-daemon (used on older releases) needed some time at startup for creating the socket in /tmp directory where it can communicate with applications.
So the script in /etc/rc.local was checking if the socket was already created, if not logging time for measurement and starting a gstreamer pipeline using nvcamera only after the socket had been created.

You may adapt for your case, argus does create a socket in the same /tmp directory (check its name there) and in your case you are launching application from systemd, it might be different timing but the way would be the same…Don’t start a nvarguscamera application before the socket in /tmp exists.