Managing disconnected mipi camera with Libargus Camera API

I am looking for the correct method of terminating an argus a camera application when camera is disconnected and recovering once reconnected.
For context my two cameras are connected via two TI ds90ub95x serder links. As a result it is possible for the coax link between the serializer and deserialized to be disconnected by bad agents.
Here is my setup.
| camera 0 | -MIPI-> | SER0(ds90ub953) | -FP3Link-> | DER0(ds90ub954) | -MIPI-> Jetson Nano
| camera 1 | -MIPI-> | SER1(ds90ub953) | -FP3Link-> | DER1(ds90ub954) | -MIPI-> Jetson Nano

I have modifilying the sample 13_multi_camera to test this problem. My current approach is to use the Status enum and timeout, to signal that a camera is nolonger connected. Once I get a timeout I attempt to exit by doing the following.

...
        for (uint32_t k = 0; k < streamCount; k++) {
            ICaptureSession *iCaptureSession =
                interface_cast<ICaptureSession>(captureHolders[k].get()->getSession());
            iCaptureSession->waitForIdle();
            captureHolders[k].reset();
        }
        consumerThread.shutdown();
        g_cameraProvider.reset();
        delete g_renderer;
...

Current errors are;

(Argus) Error EndOfFile: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 266)
(Argus) Error EndOfFile: Receive worker failure, notifying 2 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 340)
(Argus) Error InvalidState: Argus client is exiting with 2 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 357)
(Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 368)
(Argus) Error EndOfFile: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
(Argus) Error EndOfFile:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error EndOfFile: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
(Argus) Error EndOfFile:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)

I then reattach the coax link.
rmmod ds90ub954 (serder driver, works)
rmmod ar0231at (camera driver, works)
modprobe ds90ub954 (works)
modprobe ar0231at (works)
I test camera which was not disconnected using nvgstcapture. (Works)

nvgstcapture-1.0 --sensor-mode=0 --sensor-id=0 

I test camera which was disconnected. (Fails)

nvgstcapture-1.0 --sensor-mode=0 --sensor-id=1

With errors:

GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
nvbuf_utils: dmabuf_fd -1 mapped entry NOT found
nvbuf_utils: Can not get HW buffer from FD... Exiting...
CONSUMER: ERROR OCCURRED
ERROR on bus: by /GstPipeline:capture_native_pipeline/GstBin:cap_bin/GstNvArgusCameraSrc:nvarguscamerasrc0: CANCELLED

I have attached a patch for the 13_multi_camera sample.
I have also seen this post but I haven’t see any updates. https://forums.developer.nvidia.com/t/argus-capture-error-handling-is-broken/106952/27

[13_multi_camera_patch|attachment](upload://xu19xmPXxyoAdtoglmRHYEOyM7i) (4.6 KB)

13_multi_camera_patch (4.6 KB)

Sorry to tell Nano doesn’t support error recovery.

We have a project using the xavier NX. Does the xavier NX support error recovery? Is the only “recovery” method on the nano rebooting? Or is there a driver which manages the dmabuf_fd which I could build as a module and reload? It would be great if I could avoid having to reboot the nano every time I got a camera error. Any workaround where I can avoid rebooting.

You can try to listen the sensor to terminal camera APP and restart the nvargus-daemon before next launch.

I am not sure what you mean by “you can try to listen the sensor to terminal camera APP”. I am using the status enum from aquireFrame() to determine a camera fault, on a timeout I then terminate the APP, as you will see in the patch provided. I have tried restarting the nvargus-daemon after reloading the serder and camera driver modules but I still get the error;

nvbuf_utils: dmabuf_fd -1 mapped entry NOT found

Which driver/service is responsible for configuring this dmabuf_fd? If it is a driver can I build this driver as a LKM? and will reloading this driver help?

Please let me know which jetson platforms support camera error recovery?

If you able to know the sensor is disconnect to terminal the APP.
Suppose senor LKM don’t help on this kind of case. TX2/Xavier/Xavier NX both support it.

Which driver/service is responsible for configuring this dmabuf_fd? If it is a driver can I build this driver as a LKM? and will reloading this driver help?

It could be capture unable capture frame data from sensor to show this message.

Hi Shane. It is obvious that we are not understanding each other.

  1. Given the situation, I have described (if you need clarification please say so). Is the only way to recover to reboot the nano? You can test this yourself with two pi cameras and a development board.
  2. Why is it that dmabuf_fd entry can’t be found for the disconnected and reconnected camera?
  3. How is this issue addressed on the TX2/Xavier/Xavier NX
  1. Try restart nvargus_daemon instead of reboot system.
sudo service nvargus-daemon restart
  1. I think it could be capture failed cause DMA buffer been locked.
  2. This implementation in none public source.

As I said in a previous post restarting nvargus-daemon does not resolve the issue.
I have attached log files generated by following the steps detailed below;

dmesg -w | tee ~/dmesg.log
[dmesg.log|attachment](upload://zKYr1Kb0XzbJPHk4bAcOkkzu1er.log) (247.9 KB)

sudo modprobe ds90ub954 
sudo modprobe ar0231at
cd /usr/src/jetson_multimedia_api/samples/13_multi_camera
./multi_camera

run patched ./multi_camera and remove camera 2

./multi_camera

restart nvargus-daemon as suggested and reconnect camera

sudo journalctl -u nvargus-daemon.service
sudo rmmod ds90ub954
sudo modprobe ds90ub954
sudo rmmod ar0231at
sudo modprobe ar0231at
sudo journalctl -u nvargus-daemon >> ~/journallog.log

restart nvargus-daemon after cameras are reconnected. Try caputre from camera 1

sudo systemctl restart nvargus-daemon
nvgstcapture-1.0 --sensor-mode=0 --sensor-id=0

Camera 1 works, try capture from camera 2

nvgstcapture-1.0 --sensor-mode=0 --sensor-id=1

Camera 2 fails with error-> nvbuf_utils: dmabuf_fd -1 mapped entry NOT found

sudo journalctl -u nvargus-daemon >> ~/journallog2.log
sudo systemctl restart nvargus-daemon

Retry capture from camera 1
nvgstcapture-1.0 --sensor-mode=0 --sensor-id=0

Now fails with error -> nvbuf_utils: dmabuf_fd -1 mapped entry NOT found

[dmesg.log|attachment](upload://zKYr1Kb0XzbJPHk4bAcOkkzu1er.log) (247.9 KB)
[journallog.log|attachment](upload://vMiwX1xTp50OVGCtivlesoQZSlV.log) (11.5 KB)
[journallog2.log|attachment](upload://eHKxQBv3qCTo9cg0i11N2ypkxTp.log) (25.0 KB)

Nano may need reboot to recovery for some camera critical error.

I see logs didn’t attach. See below
dmesg.log (247.9 KB)
journallog.log (11.5 KB)
journallog2.log (25.0 KB)