Multi Camera aquisition crashes with argus SCF_AutocontrolACSync error

Hi ,

we have an API which grabs images from two cameras. This runs fine for several hours until somehow argus seems to crash with following error:

nvargus-daemon[31084]: SCF: Error Timeout:  (propagating from src/components/amr/Snapshot.cpp, function waitForNewerSample(), line 91)
nvargus-daemon[31084]: SCF_AutocontrolACSync failed to wait for an earlier frame to complete.
nvargus-daemon[31084]: SCF: Error Timeout:  (propagating from src/components/ac_stages/ACSynchronizeStage.cpp, function doHandleRequest(), line 126)
nvargus-daemon[31084]: SCF: Error Timeout:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 137)
nvargus-daemon[31084]: SCF: Error Timeout: Sending critical error event for Session 0
nvargus-daemon[31084]:  (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
nvargus-daemon[31084]: waitForIdleLocked remaining request 223013
nvargus-daemon[31084]: waitForIdleLocked remaining request 223004
nvargus-daemon[31084]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 969)
nvargus-daemon[31084]: SCF: Error Timeout:  (propagating from src/components/CaptureContainerImpl.cpp, function assignAllBuffersFromStream(), line 271)
nvargus-daemon[31084]: SCF: Error Timeout:  (propagating from src/components/stages/CCDataSetupStage.cpp, function doHandleRequest(), line 76)
nvargus-daemon[31084]: SCF: Error Timeout:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
nvargus-daemon[31084]: PowerServiceCore:handleRequests: timePassed = 16980
nvargus-daemon[31084]: PowerServiceCore:handleRequests: timePassed = 2238
nvargus-daemon[31084]: waitForIdleLocked remaining request 223004
nvargus-daemon[31084]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 969)
nvargus-daemon[31084]: SCF: Error InvalidState: Session has suffered a critical failure (in src/api/Session.cpp, function capture(), line 734)
nvargus-daemon[31084]: (Argus) Error InvalidState:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 110)

We are running JP5.1.2 and have already a patched argus, fusacap and scf lib. those githashes are:

## Libraries
 LIBARGUS hash            : b6f55f6cb2de004bcb4a5238f9ed3938  /usr/lib/aarch64-linux-gnu/tegra/libnvargus.so
 LIBFUSACAP hash          : 06191cfcec47e09afb3539b2e19dc11c  /usr/lib/aarch64-linux-gnu/tegra/libnvfusacap.so
 LIBSCF hash              : 689cde0c4539304b12894923b879383b  /usr/lib/aarch64-linux-gnu/tegra/libnvscf.so

What can we do to prevent this error?

Hi, @PS1234 ,
What hardware configuration are you using?
We had issues with unstable serdes links, where from time to time the signal drops and we get similar errors as you. Maybe check either if using normal csi-2/mipi cams, the cable is too long and/or subjected to outside interference, or if using serdes, try and check the quality of the link via crc error checks or link status registers.

Regards,
Andres
Embedded SW Engineer at RidgeRun
Contact us: support@ridgerun.com
Developers wiki: https://developer.ridgerun.com
Website: www.ridgerun.com

Hi Andres,

Thank you for that Info. We actually use a Ser/Des configuration, however we have never seen this error before and the acquisition was stable.

Now with a slightly different camera revision this error occurs always after 2-3 hours, if it would be some cable interference or signal strengh drop, wouldnt it then be more random?

Maybe good to mention:

The test setup is two Cameras streaming ~100 Images. Then we stop the acquisition and start it again. We do this in a endless loop.
The error seems to happen in the stop/start sequence, so not while we expect images.
I dont know if the waitForIdle is a followup error or if it is linked with it.

This is a huge stability problem, I would be thankful for every help.

I just had another crash. The application crashed in waitForIdle state and the error message was the same as above.,
We have added logs in the imx driver which print to systemlog when different functions are called. However when this SCF_AutocontrolACSync error occured the stop command which tells the sensor to stop streaming was never called as it was in the iterations before.

We stop the sensor like this:

// Ensure the capture session is stopped and idle
  if (m_iCaptureSession != nullptr)
  {
    m_iCaptureSession->stopRepeat();
    m_iCaptureSession->cancelRequests();
    Argus::Status status = m_iCaptureSession->waitForIdle(kWaitForIdle); // 30s

Is it possible that argus internally sometimes misses the call to the sensor driver?

Hi,

I just enabled more argus logs and found out that this actually changes the problems. The application now crashes way later than without those logs enabled:

sudo service nvargus-daemon stop
sudo su
export enableCamPclLogs=5
export enableCamScfLogs=5
/usr/sbin/nvargus-daemon 

So I suspect some kind of timing issue with the nvargus-daemon. is this plausible?

The error also changed and this time the cuEGLStreamConsumerAcquireFrame(m_cudaConnection.get(), &pCudaResource, &pCudaStream, timoutAcquireFrame);goes into timeout. this happend after the application was running for >10h and was looping for the 3700th time.

dmesg looks like this:

[RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=66547589909696 sof_ts=66547630412096 gerror_code=2 gerror_data=400060 notify_bits=4000
[66531.811506 <    1,005298>] host1x 13e40000.host1x: ViCsiHw frameCo: syncpoint id 43 (progress) stuck waiting 5532185, timeout=36000"
[66531.811880 <    0,000374>] ---- syncpts ----
[66531.811908 <    0,000028>] id 20 (ga10b_511) min 29 max 0 refs 1 (previous client : )
[66531.811914 <    0,000006>] id 21 (ga10b_510) min 9 max 0 refs 1 (previous client : )
[66531.811920 <    0,000006>] id 22 (ga10b_509) min 16 max 12 refs 1 (previous client : ga10b_509)
[66531.811926 <    0,000006>] id 23 (progress) min 9588489 max 894777 refs 1 (previous client : progres
``` with this ga error continuing