Argus Daemon Errors - Max Frames Acquired

The issue is not fixed. 32.2 doesn’t resolve the issues. The “MaxFramesAcquired” is no longer the problem (nor has it been in a few months), however, the argus_daemon still crashes and repeats this error message forever:

Apr  5 13:05:10 tegra-ubuntu argus_daemon[1501]: (Argus) Error InvalidState:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)
Apr  5 13:05:10 tegra-ubuntu argus_daemon[1501]: SCF: Error InvalidState: Session has suffered a critical failure (in src/api/Session.cpp, function capture(), line 689)

There still seems to be memory leaks from the daemon via kmemleak-detector and the ISP portion of the software still crashes in seemingly undetectable ways…

hello ben.lemond,

l4t-r32.2 actually have architecture change to address buffering issue.
seems you found a new issue with l4t-r32.2, suggest you open another new ticket with the reproduce steps.
thanks

hi ben:
Is there a solution to this problem? We also have the same problem。

I’m also curious if there’s any new information on this. With 32.1 we saw the high-rate repeated error message issue very often. Mostly when starting streaming or very soon thereafter (with 6 cameras). With 32.2 I haven’t seen it happen when starting streaming, but have seen it happen after the cameras have been running for a while (20-30+ minutes). Both in multi-process and single-process builds. I’ve been trying to make a smaller standalone app that reproduces it, but haven’t been successful yet.

@ksa, @keith.wang

There has been no fix within Argus, whether it bé in 32.1, 32.2 or otherwise. We have abandoned all attempted Argus implementations

Have a check

sudo cat /sys/kernel/debug/bpmp/debug/clk/i2c*/rate

And have you can unlock it to set the rate manually to try.

sudo echo 1 > /sys/kernel/debug/bpmp/debug/clk/i2c*/mrq_rate_locked
sudo echo xxxx > /sys/kernel/debug/bpmp/debug/clk/i2c*/rate

Hi ShaneCCC,

Could you explain why setting the rate manually would help? Or what rate to try setting it to?

The biggest issue isn’t that Argus hits an error, but that it can get into state where it spams those two error messages at a very fast rate seemingly forever. That makes it difficult or impossible to recover from. It seems like nvidia should change the design to prevent that situation from even being possible.

For the most part 32.2 has been a major improvement over 32.1 for us. We went from hitting this issue quite often to only having it happen once or twice in a week of testing. However, the fact that it can happen is worrying.

Kes25c,

run stress program (sudo apt-get install stress). You will accelerate the testing and the occurance of the problem.

Thanks ben.lemond. Using stress --cpu 5 while running our application makes it trigger very easily. It also happens with stress --cpu 4, but not as often. Thus far it’s always the following error (with different frame numbers):

Failed to fetch stats for frame 6763 
SCF: Error BadParameter:  (propagating from src/services/autocontrol/isp_stats/IspLegacyStatsDecoder.cpp, function decodeIspStats(), line 58)
SCF: Error BadParameter:  (propagating from src/services/autocontrol/NvCameraIspDriver.cpp, function updateStats(), line 323)
SCF: Error BadParameter:  (propagating from src/components/stages/StatsUpdateStage.cpp, function doHandleRequest(), line 170)
SCF: Error BadParameter:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
SCF: Error BadParameter: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 990)

Failed to fetch stats for frame 6764 
Error 0x00000004 occurred at /dvs/git/dirty/git-master_linux/camera/core_v3/camera_isp/isp/state_update/blocks/stats/LAC.cpp:664
SCF: Error BadParameter:  (propagating from src/services/autocontrol/isp_stats/IspLegacyStatsDecoder.cpp, function decodeIspStats(), line 58)
SCF: Error BadParameter:  (propagating from src/services/autocontrol/NvCameraIspDriver.cpp, function updateStats(), line 323)
SCF: Error BadParameter:  (propagating from src/components/stages/StatsUpdateStage.cpp, function doHandleRequest(), line 170)
SCF: Error BadParameter:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
SCF: Error BadParameter: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 990)

Before getting the quick repeating of:

SCF: Error InvalidState: Session has suffered a critical failure (in src/api/Session.cpp, function capture(), line 667)
(Argus) Error InvalidState:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)

Edit: also hit this once (before the repeating error messages started):

SCF: Error NotSupported: AMR Sample data type is error, requested type is IspRawStats* (in src/components/amr/Sample.cpp, function typeError(), line 65)
SCF: Error NotSupported:  (in src/components/amr/Sample.cpp, function get(), line 101)
SCF: Error NotSupported:  (propagating from src/common/Amr.h, function getSampleObject(), line 488)
SCF: Error NotSupported:  (propagating from src/components/ac_stages/AeAfApplyStage.cpp, function translateIspOutStatsToFrd(), line 281)
SCF: Error NotSupported:  (propagating from src/components/ac_stages/AeAfApplyStage.cpp, function doHandleRequest(), line 618)
SCF: Error NotSupported:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 137)
SCF: Error NotSupported: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 990)

Sorry, looks like I was replied to the wrong thread.
The stress cause the problem should fixed at r32.2.
We have run the stress and argus_camera without problem.

From my testing (and ben.lemond’s based on his comments) it’s definitely not fixed. However, it does seem harder to trigger with 32.2 vs 32.1.

To be clear, there are two separate issues:

A.) Argus errors happening under high cpu load.
B.) Endless, high rate spamming of the 2 error messages after an error occurs.

Could you describe exactly was changed between 32.1 and 32.2 that was suppose to address issues A and B?

Hopefully tomorrow I can post a simple app that reproduces the issues (in combination with the stress app).

hi all,

we already done some architecture changes, hence you’ll found r32.2 showing better testing results.
suggest to initial another new discussion thread for tracking latest status.
thanks

“better” testing results are not “Better” if the underlying issue still exists and the error condition can still happen, which it does with regard to your auto ISP functionality. The only thing you have successfully done is covered up and issue so that it happens more irregularly in office-testing but can still potentially happen in a production scenario.

You all keep focusing on the wrong problem. The high CPU load is not the problem. It is an accelerant. The problem still exists and is causing a lock-up in the ISP codebase. We are currently working with a “partner” of Nvidia’s on this issue and they are also not getting much support because of this problem.

As I have said before, we are in the process of changing our hardware so that we do not have to use the argus API for ISP control. This is an absolute shame because the rest of the functionality Nvidia provides is very high quality. However, we absolutely cannot have programs that segfault or die in any way under the covers without proper error recovery.

1 Like

I created a new thread https://devtalk.nvidia.com/default/topic/1065378/jetson-tx2/argus-errors-with-high-cpu-load-and-subsequent-issues/ for discussion.

thanks, let’s moving to Topic 1065378 to get things done.