Libargus Pipeline Health

Hi,

I have a question regarding the libargus pipeline health. We observed issues like the following:

[1365170.463441] fence timeout on [ffffffc09a4bb180] after 1500ms
[1365170.489067] fence timeout on [ffffffc09a4bbcc0] after 1500ms
[1365170.489073] name=[nvhost_sync:6], current value=15264653 waiting value=15264654
[1365170.489078] ---- mlocks ----
....

In certain situations, e.g.: (a) issues with external trigger; (b) out of memory and high system load.

In our application, this does not lead to errors directly. We observe a lag during image acquisition or even old images repeating (last camera frame frozen). What we would like to have is a health status of the complete image acquisition pipeline (including Driver/ISP/VI etc.) along with the image acquisition.

Questions:
(1) Is there any possibility to query the current pipeline health or diagnose errors from libargus?
(2) Is it possible to check the Driver/ISP/VI state from system level (preferably via API) other than scanning kernel logs?

I would appreciate any pointers on that issue.

Best,

Axel

Sorry to tell no. But you can reference to the userAutoExposure of MMAPI sample for the error handling.
What’s the sensor module?

Sensor module is IMX412 (VisionComponents)

How does libargus handle EGL_SUPPORT_REUSE_NV ? Is there a way to configure this, as described here:

Acquire an image frame from EGLStreamKHR. This API can also acquire an old frame presented by the producer unless explicitly disabled by setting EGL_SUPPORT_REUSE_NV flag to EGL_FALSE during stream initialization. By default, EGLStream is created with this flag set to EGL_TRUE. cuGraphicsResourceGetMappedEglFrame can be called on pCudaResource to get CUeglFrame.

Seemingly, we receive the last frame multiple times (one repeat for each timeout), when the argus pipeline has issues. Still, we receive a capture success (while the pipeline is dead already).

Also, this behavior is active, when we are in software-trigger mode, then we receive an old frame every timeout seconds, without EGLOutputStream marking it as timeout…

Best,

Axel

I don’t think so.
If you concern about the EGL you can try another consumer instead of EGL. You can check the sample code in MMAPI package.

Hi Shane,

thank you for your pointers so far. I have now a somewhat better solution, combining the events (as shown in userAutoExposure), and the metadata timestamps to avoid double images and create timeout events. This solves like 90% of the problem. Now, the last 10% are creating a valid degraded/fatal error when the image pipeline is disfunctional at some point and the application should be restarted:

sending valid frame   
PowerServiceCore:handleRequests: timePassed = 54466
SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceEvent.cpp, function wait(), line 59)
sending valid frame   
[ArgusCamera] received error event
sending valid frame   
sending valid frame   
Error: Camera HwEvents wait, this may indicate a hardware timeout occured,abort current/incoming cc
[pipeline stalled]

Unfortunately, we cannot bind to the error event from libargus, as this is not always emitted in this situation. Also, it is emitted in other situations as well (pipeline restart after e.g. trigger mode change).

Is there any way to receive the SCF error, which obviously is emitted somewhere in our executable within the MMAPI/SCF libraries. But I did not find any handle on that.

Best,

Axel

While get below message the APP need to terminal and restart it again.

[ArgusCamera] received error event

Yes. Obviously! Again: this is not always emitted in this situation…

But I think I have found somewhat of a solution nevertheless:

  • use error events (if emitted)
  • use frame success events for correct timeout (not possible with cuEGL as it re-sends old frames on timeout and does not send timeout codes)
  • decide with timeout what to do: free-run → pipeline dead // trigger mode + dangling trigger events → pipeline dead // trigger mode + no trigger → ignore

Interestingly, it sends Argus::STATUS_TIMEOUT events in case of OOM / SCF crash and STATUS_CANCELLED in case of pipeline restarts (e.g. change of camera mode). I dont see DISCONNECTED or OUT_OF_MEMORY as I would expect (never).

I think we can close the issue – BUT: it would be nice if being able to configure cuEGL / EGLStream with EGL_SUPPORT_REUSE_NV off via the Argus API (is libargus still actively developed?)!

Best,

Axel