Argus Streaming STATUS_CANCELLED

can this repo’ed by 30W power modes?

Hi Jerry,

in our use-case, on JetPack 5.1.0 we only see error type of STATUS_CANCELLED, but on JetPack 5.1.2 we can see different types such asSTATUS_CANCELLED, STATUS_UNAVAILABLE, STATUS_TIMEOUT.

Can we see those error types on JetPack 5.1.2 as different error reporting ways of one single common defect?

The detailed logs are as following:

STATUS_UNAVAILABLE:

SCF: Error InvalidState:  Corr Error Received for sensor 0 .. Continuing!
 (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 643)

STATUS_TIMEOUT:

CF: Error Timeout: Sending critical error event for Session 2
 (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
SCF: Error Timeout: Sending critical error event for Session 1
 (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
SCF: Error InvalidState: Session has suffered a critical failure (in src/api/Session.cpp, function capture(), line 734)
SCF: Error InvalidState: Timeout!! Skipping requests on sensor GUID 1, capture sequence ID = 11846 draining session frameStart events 1
 (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameStart(), line 532)
(Argus) Error InvalidState:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 110)
SCF: Error InvalidState: Sensor 1 already in same state
 (in src/services/capture/CaptureServiceDeviceSensor.cpp, function setErrorState(), line 100)
SCF: Error InvalidState: Sensor 0 already in same state
 (in src/services/capture/CaptureServiceDeviceSensor.cpp, function setErrorState(), line 100)
2023-10-21 11:27:32.098 ERROR [213623] [streaming::StreamMgr::ErrorOperation@223] sensor_device-1, ErrorOperation: STATUS_TIMEOUT, level: 3
SCF: Error InvalidState: Timeout!! Skipping requests on sensor GUID 0, capture sequence ID = 11844 draining session frameEnd events 3
 (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 635)
2023-10-21 11:27:32.098 INFO  [213623] [streaming::StreamMgr::ErrorOperation@230] device:
SCF: Error Timeout: Sending critical error event for Session 0
 (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
2023-10-21 11:27:32.098 INFO  [213623] [streaming::StreamMgr::ErrorOperation@230] device: ^A, frame pool size: 0
SCF: Error InvalidState: Session has suffered a critical failure (in src/api/Session.cpp, function capture(), line 734)
2023-10-21 11:27:32.098 INFO  [213623] [streaming::StreamMgr::ErrorOperation@230] device: ^B, frame pool size: 0
SCF: Error InvalidState: Timeout!! Skipping requests on sensor GUID 0, capture sequence ID = 11846 draining session frameStart events 1
 (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameStart(), line 532)
2023-10-21 11:27:32.098 INFO  [213623] [streaming::StreamMgr::ErrorOperation@230] device: ^D, frame pool size: 16
2023-10-21 11:27:32.098 INFO  [213623] [streaming::StreamMgr::ErrorOperation@230] device: ^E, frame pool size: 16
2023-10-21 11:27:32.098 INFO  [213623] [streaming::StreamMgr::ErrorOperation@230] device: ^F, frame pool size: 16
2023-10-21 11:27:32.098 INFO  [213623] [streaming::StreamMgr::ErrorOperation@230] device: ^G, frame pool size: 16
SCF: Error Timeout: Sending critical error event for Session 0
 (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
SCF: Error InvalidState: Sensor 0 already in same state
 (in src/services/capture/CaptureServiceDeviceSensor.cpp, function setErrorState(), line 100)
SCF: Error InvalidState: Timeout!! Skipping requests on sensor GUID 0, capture sequence ID = 11845 draining session frameEnd events 2
 (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 635)
SCF: Error InvalidState: Sensor 0 already in same state
 (in src/services/capture/CaptureServiceDeviceSensor.cpp, function setErrorState(), line 100)
SCF: Error InvalidState: Timeout!! Skipping requests on sensor GUID 0, capture sequence ID = 11846 draining session frameEnd events 1
 (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 635)
SCF: Error InvalidState: Sensor 2 already in same state
 (in src/services/capture/CaptureServiceDeviceSensor.cpp, function setErrorState(), line 100)
SCF: Error InvalidState: Timeout!! Skipping requests on sensor GUID 2, capture sequence ID = 11844 draining session frameEnd events 3
 (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 635)
SCF: Error InvalidState: Sensor 2 already in same state
 (in src/services/capture/CaptureServiceDeviceSensor.cpp, function setErrorState(), line 100)
SCF: Error InvalidState: Timeout!! Skipping requests on sensor GUID 2, capture sequence ID = 11845 draining session frameEnd events 2
 (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 635)
SCF: Error InvalidState: Sensor 2 already in same state
 (in src/services/capture/CaptureServiceDeviceSensor.cpp, function setErrorState(), line 100)
SCF: Error InvalidState: Timeout!! Skipping requests on sensor GUID 2, capture sequence ID = 11846 draining session frameEnd events 1

JP-5.1.2 added new status,
for example, STATUS_CANCELLED, which used to report aborted captures.
whereas, if there’s already allocated (cannot register a buffer with a hw channel), or an invalid state will report STATUS_UNAVAILABLE.
an STATUS_TIMEOUT is reported when there’s condition wait has timed out.

yes, this can be still reproduced by using different power modes

Hi Jerry,

on JP-5.1.2, we see plenty of STATUS_TIMEOUT errors, which we’ve never seen on JP-5.1.0. One more strange thing is that in a certain bad cold bootof orin, the errorSTATUS_TIMEOUT happens continuously in every several minutes (in a normal cold boot, error STATUS_TIMEOUT only happens in every several hours).

Hi Jerry,

we’ve aso exprted enableCamInfiniteTimeout=1 before running streaming, is it expected that it’s not working in this case?

Hi dear Jerry,

have you already reproduced this issue on your side?
looking forward to your reply!

hello Krisss,

this is still under investigation.

we’re able to reproduce the same failure by running camera use-case along with GPU stressed.
even though we could workaround this failure by disabling couple of CUDA service (from camera stack), it’s still unknown as how the GPU stress is triggering this error.

Hi Jerry,

Do you have any update for this issue? Thank you.

hello ting.chang,

please give it a try to test with attached scf binary file, Topic269543_Nov20_libnvscf.zip (2.3 MB)
this is pre-built updated by disabling couple of CUDA services (from camera stack).

Hi Jerry,

could you please elaborate the target fixes of this patch, thanks!

I tested 6 cameras with this libnvscf.so under high GPU stress for three days.
Unfortunately, the same problem happened on the 3rd day…

hello ting.chang,

I believe that pre-built update in comment #33 already improves the stability with GPU stress, right?

FYI,
we did see some camera long run stability issue (for more than 24-hours), it’s still under investigation now.
may I know what’s the criteria/expectation, or, what’s your real use-case for long-running along with GPU stressed.

Hi Jerry,

If we stress both CPU and GPU, the problem will happen within 12 hours.
Our expectation is to keep normal operation at least 12 hours.

hello ting.chang,

had you also try system level configuration,
for example, you may using taskset to assign CPU resources, and renice to have higher priority for camera process?

BTW,
could you please check with real use-case instead of running utility to fully occupy CPU and GPU resources.

Hi Jerry,

The problem happens when we use Bayer camera with nvargus.
If we test YUV cameras without using nvargus, there are no problems running full loading. We want to keep the same criteria for the test w/ and w/o nvargus.