Long time start/stop with argus leads to autocontrol SCF_AutocontrolACSync error

Hi!

unfortunately I am forced to open a second issue as this problem still persists (see here: Multi Camera aquisition crashes with argus SCF_AutocontrolACSync error).

We still see this SCF_AutocontrolACSync error appear after running our test for several hours. We dont see anything wrong with our pipeline from argus above as it runs stable for several hours without stopping the sensors and we followed the argus samples in starting and stopping the image aquisition.
We already tried this Libnvargus report "SCF_AutocontrolACSync failed to wait for an earlier frame to complete" without any noticable effect.

One thing we observed was, that when this error occurs one of the stop_streaming methods seems to take over 15 seconds to complete before we can stop another sensor. See sample from dmesg with delta timings below.

[<    3,995971>] sensor111-0040:  sensor1_stop_streaming entered.
[<    0,018151>] sensor2 11-001a:  sensor2_stop_streaming entered.
[<    1,776255>] sensor3 10-001a:   sensor3_stop_streaming entered.
[<   15,299483>] sensor4 10-0040:  sensor4_stop_streaming entered.

It would really help us if someone could explain when this SCF_AutocontrolACSync error can occur and how we can further debug it. We are also able to change driver code and add more debug calls to dmesg if this helps.

thanks

What’s the version?

We are running JetPack 5.1.2.

The used argus, fusacap and nvscf library versions are updated and taken from this issue:

Apply below lib and update RCE firmware to verify the problem.

libnvfusacap.so.r35.4_none_nofity (193.6 KB)
camera-rtcpu-t234-rce.img.r35.4.1.asynchronous (519.3 KB)

Hi Shane,

thank you for the provided update files. We flashed the RCE and updated the fusacap library and ran the same test again. Preliminary results are that the application still crashes after several hours.
We will verify that it is the same error but the behaviour is very similar to the state before we did the update.

Hi, further Update from a second iteration test:


it seems that the error is still the same.

Please copy those message here instead of screen shot.

Thanks

Sorry for that. Here it is written out:

Sep 18 19:50:38 nvidia-desktop nvargus-daemon[2938]: PowerServiceCore: handleRequests: timePassed = 1023
Sep 18 19:50:40 nvidia-desktop nvargus-daemon[2938]: SCF: Error Timeout: (propagating from src/components/amr/Snapshot.cpp, function waitForNewerSample(), line 91)
Sep 18 19:56:40 nvidia-desktop nvargus-daemon[2938]: SCF_AutocontrolACSync failed to wait for an earlier frame to complete.
Sep 18 19:50:40 nvidia-desktop nvargus-daemon[2938]: SCF: Error Timeout: (propagating from src/components/ac_stages/ACSynchronizeStage.cpp, function doHandleRequest(), line 126)
Sep 18 19:50:40 nvidia-desktop nvargus-daemon[2938]: SCF: Error Timeout: (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 137)
Sep 18 19:50:40 nvidia-desktop nvargus-daemon[2938]: SCF: Error Timeout: Sending critical error event for Session 0
Sep 18 19:50:40 nvidia-desktop nvargus-daemon[2938]: (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
Sep 18 19:50:40 nvidia-desktop nvargus-daemon[2938]: SCF: Error Timeout: (propagating from src/components/amr/Snapshot.cpp, function waitForNewerSample(), line 91)
Sep 18 19:50:40 nvidia-desktop nvargus-daemon[2938]: SCF_AutocontrolACSync failed to wait for an earlier frame to complete.
Sep 18 19:50:40 nvidia-desktop nvargus-daemon[2938]: SCF: Error Timeout: (propagating from src/components/ac_stages/ACSynchronizeStage.cpp, function doHandleRequest(), line 126)
Sep 18 19:50:40 nvidia-desktop nvargus-daemon[2938]: SCF: Error Timeout: (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 137)
Sep 18 19:50:40 nvidia-desktop nvargus-daemon[2938]: SCF: Error Timeout: Sending critical error event for Session 0
Sep 18 19:50:40 nvidia-desktop nvargus-daemon[2938]: (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
Sep 18 19:50:45 nvidia-desktop nvargus-daemon[2938]: waitForIdleLocked remaining request 80102
Sep 18 19:50:45 nvidia-desktop nvargus-daemon[2938]: waitForIdleLocked remaining request 80093
Sep 18 19:50:45 nvidia-desktop nvargus-daemon[2938]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 969) 

Could you boost the clocks to try.

sudo su
echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate |tee /sys/kernel/debug/bpmp/debug/clk/vi/rate
cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate | tee  /sys/kernel/debug/bpmp/debug/clk/isp/rate
cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate
cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/emc/rate

Hi Shane,

unfortunately it still crashes with the same behaviour even after boosting the clocks:

Sep 19 12:03:39 nvidia-desktop nvargus-daemon[2916]: SCF: Error Timeout:  (propagating from src/components/amr/Snapshot.cpp, function waitForNewerSample(), line 91)
Sep 19 12:03:39 nvidia-desktop nvargus-daemon[2916]: SCF_AutocontrolACSync failed to wait for an earlier frame to complete.
Sep 19 12:03:39 nvidia-desktop nvargus-daemon[2916]: SCF: Error Timeout:  (propagating from src/components/ac_stages/ACSynchronizeStage.cpp, function doHandleRequest(), line 126)
Sep 19 12:03:39 nvidia-desktop nvargus-daemon[2916]: SCF: Error Timeout:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 137)
Sep 19 12:03:39 nvidia-desktop nvargus-daemon[2916]: SCF: Error Timeout: Sending critical error event for Session 0
Sep 19 12:03:39 nvidia-desktop nvargus-daemon[2916]:  (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
Sep 19 12:03:39 nvidia-desktop nvargus-daemon[2916]: SCF: Error Timeout:  (propagating from src/components/amr/Snapshot.cpp, function waitForNewerSample(), line 91)
Sep 19 12:03:39 nvidia-desktop nvargus-daemon[2916]: SCF_AutocontrolACSync failed to wait for an earlier frame to complete.
Sep 19 12:03:39 nvidia-desktop nvargus-daemon[2916]: SCF: Error Timeout:  (propagating from src/components/ac_stages/ACSynchronizeStage.cpp, function doHandleRequest(), line 126)
Sep 19 12:03:39 nvidia-desktop nvargus-daemon[2916]: SCF: Error Timeout:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 137)
Sep 19 12:03:39 nvidia-desktop nvargus-daemon[2916]: SCF: Error Timeout: Sending critical error event for Session 0
Sep 19 12:03:39 nvidia-desktop nvargus-daemon[2916]:  (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
Sep 19 12:03:44 nvidia-desktop nvargus-daemon[2916]: waitForIdleLocked remaining request 119429
Sep 19 12:03:44 nvidia-desktop nvargus-daemon[2916]: waitForIdleLocked remaining request 119420
Sep 19 12:03:44 nvidia-desktop nvargus-daemon[2916]: waitForIdleLocked remaining request 119429
Sep 19 12:03:44 nvidia-desktop nvargus-daemon[2916]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 969)

Please also apply below libs.

[Argus][multi-camera] fix memory corruption within libnvargus Argus pipeline randomly gets error - #4 by JerryChang

[Argus] fixes for set_mode_delay_ms and infinite timeout support Infinite timeout support - #10 by JerryChang

Hi Shane,

thank you, we will add them and test again. Just for your information, we already use this non standard version of the 3 librarys from this issue: Argus image acquisition crashes after a few days are they older or do they not include those fixes?

using those librarys the test also crashes after roughtly an hour:

Sep 20 11:45:36 nvidia-desktop nvargus-daemon[2880]: SCF: Error Timeout:  (propagating from src/components/amr/Snapshot.cpp, function waitForNewerSample(), line 91)
Sep 20 11:45:36 nvidia-desktop nvargus-daemon[2880]: SCF_AutocontrolACSync failed to wait for an earlier frame to complete.
Sep 20 11:45:36 nvidia-desktop nvargus-daemon[2880]: SCF: Error Timeout:  (propagating from src/components/ac_stages/ACSynchronizeStage.cpp, function doHandleRequest(), line 126)
Sep 20 11:45:36 nvidia-desktop nvargus-daemon[2880]: SCF: Error Timeout:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 137)
Sep 20 11:45:36 nvidia-desktop nvargus-daemon[2880]: SCF: Error Timeout: Sending critical error event for Session 0
Sep 20 11:45:36 nvidia-desktop nvargus-daemon[2880]:  (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
Sep 20 11:45:36 nvidia-desktop nvargus-daemon[2880]: SCF: Error Timeout:  (propagating from src/components/amr/Snapshot.cpp, function waitForNewerSample(), line 91)
Sep 20 11:45:36 nvidia-desktop nvargus-daemon[2880]: SCF_AutocontrolACSync failed to wait for an earlier frame to complete.
Sep 20 11:45:36 nvidia-desktop nvargus-daemon[2880]: SCF: Error Timeout:  (propagating from src/components/ac_stages/ACSynchronizeStage.cpp, function doHandleRequest(), line 126)
Sep 20 11:45:36 nvidia-desktop nvargus-daemon[2880]: SCF: Error Timeout:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 137)
Sep 20 11:45:36 nvidia-desktop nvargus-daemon[2880]: SCF: Error Timeout: Sending critical error event for Session 0
Sep 20 11:45:36 nvidia-desktop nvargus-daemon[2880]:  (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
Sep 20 11:45:41 nvidia-desktop nvargus-daemon[2880]: waitForIdleLocked remaining request 34281

What’s your test?

We work with two IMX715 Sensor from Framos and basically do a loop of starting the image aquisition, grabbing 10 images and stop the acquisition again.

What’s the command?

its a custom testsuite using libargus. Unfortunately I cant share our code base here.

OK, please check if any MMAPI sample code able to reproduce the issue.

Thanks