Using Argus::EGL_STREAM_MODE_FIFO causes ICaptureSession::waitForIdle() to hang in a multi-camera scenario

Platform: Xavier
Jetpack: 4.2 (stuck on 4.2 for the moment because of camera driver support)
Camera: IMX577 (Leopard Imaging)

(Note: I have included a sample program to replicate my issue, attached. It can be copied to tegra_multimedia_api/argus/samples. Just edit tegra_multimedia_api/argus/CMakeLists.txt to include the samples/argus_multi_fifo directory.)

Our application stitches images from several cameras in real time on the Xavier.
I recently changed our Argus camera configuration code to use EGL_STREAM_MODE_FIFO instead of the default Mailbox mode, and it has started to hang in waitForIdle()

Old code:

iEGLStreamSettings->setMode(Argus::EGL_STREAM_MODE_MAILBOX);

New code:

iEGLStreamSettings->setMode(Argus::EGL_STREAM_MODE_FIFO);
iEGLStreamSettings->setFifoLength(2);

I then launch repeat captures:

iCaptureSession->repeat(myrequest.get());

Later, I stop the repeat captures. Ever since I set the EGL stream mode to FIFO, my code seems to hang in the call to ICapture::waitForIdle() while stopping the cameras. I have several capture sessions, and it doesn’t always seem to hang on the first one.

iCaptureSession->stopRepeat();
iCaptureSession->waitForIdle();

If I wait long enough, the attached program generates the following error messages:

Stopping Camera 0:

(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 87)

Stopping Camera 1

(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 87)

Followed by:

(Argus) Error FileOperationFailed: Failed socket read: Connection reset by peer (in src/rpc/socket/common/SocketUtils.cpp, function readSocket(), line 79)
(Argus) Error FileOperationFailed: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 266)
(Argus) Error FileOperationFailed: Receive worker failure, notifying 3 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 340)
(Argus) Error InvalidState: Argus client is exiting with 3 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 357)
(Argus) Error FileOperationFailed: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 368)
(Argus) Error FileOperationFailed: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
(Argus) Error FileOperationFailed:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 87)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 87)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 87)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 87)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 87)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 87)

Based on the attached source code, is there anything I need to ensure before I call waitForIdle()?
Note: the code is hard-coded to grab the first sensor mode and run at 30 fps. You may need to modify it to match your camera capabilities.
argus_multi_fifo.tar.gz (3.97 KB)

Other things perhaps worth mentioning:

  • I interface to the EGL stream using the CUDA EGL Consumer interop functions (ex: cudaEGLStramConsumerAcquireFrame etc.)
  • I tried modifying argus_multistream to use EGL_STREAM_MODE_FIFO. It seems to recover correctly. One difference I noticed was that the JPEG stream uses repeated calls to capture() instead of a single call to repeat()

Hi,
In Argus implementation, we have verified mailbox mode and fifo mode may not work properly. Are you able to run your usecase in mailbox mode?

OK, thanks for the information.

I saw that the fifo mode was being used in some of the tegra_multimedia_api examples - a search for EGL_STREAM_FIFO_MODE revealed its use in cudaBayerDemosaic and yuvJpeg. I guess they work with their use of capture() instead of repeat(). I’ll keep that in mind.

I have a workaround where I set up a thread to copy the mailbox’s eglFrame to another image queue.

By the way, just curious: what happens if I call cudaEGLStreamConsumerAcquire twice in a row (passing different cudaGraphicsResource pointers to populate). Will the mailbox allocate a new buffer if I don’t release the first one? I’ll give it a try, but is it acceptable?

I resolved this issue (for now) by implementing a thread dedicated to pulling images out of the mailbox as soon as they are available. It then performs some light preprocessing and puts the result into an actual queue from which the rest of the program consumes its frames.

BareMetalCoder: does your dedicated thread use individual capture() calls, or do you use repeat()? Thanks!

Wow. Sorry. Accidentally blew away my last post with this new interface. Very sensitive!

@D3_cwhite: My program launches repeat() and then fires off a thread that calls cudaEGLStreamConsumerAcquireFrame() and cudaEGLStreamConsumerReleaseFrame() in a loop until it is told to shut down.

I’m still seeing issues with repeat()…
I recently created a unit test that does the following

(Usual setup, including the creation of a request object, configuring the EGL Stream to Mailbox, and connecting a client thread)

loop {
    iCaptureSession->repeat(request.get());
    (wait 1 second)
    iCaptureSession->stopRepeat();
    iCaptureSession->waitForIdle();
    (wait 1 second)
}

Disconnect the client

The first 8 iterations or so seem to work well. I can see my capture thread grabbing the expected number of frames.
However, after that point, I see fewer and fewer frames get captured at each iteration.
After about 10 iterations, no images are captured, and I start seeing the same problem I was seeing before - waitForIdle hangs until the driver times out. Nothing works after that point and I usually have to call sudo systemctl restart nvargus-daemon to get things going again.

If I disable waitForIdle(), I still see the frame rate drop as described above, until finally no frames are generated, and the Acquire calls time out.
Almost feels like we’re gradually bleeding resources.

Is it legal to perform the operations in a loop, like I am doing above? Do I need to clean up other things after stopRepeat()? Looking at the samples, I don’t see any extra steps, but then again, the existing samples don’t seem to call repeat() again after stopRepeat().

I’m now using JetPack 4.3, buy the way.

OK, I figured it out!
It would appear that I had a path in my Consumer thread code that didn’t return the EGL image back to the buffer (Consumer Release Frame) before shutting down. I’m guessing the EGL Stream/Mailbox eventually ran out of buffers to capture images to.

Thanks very much! I haven’t tried the cudaEGLStream* functions — sounds like I need to give them a shot.