This is a continuation from the previous topic
with the focus on fixing nvargus-daemon to be more robust in cleaning up the video pipeline OR dying gracefully and letting
systemd restart it when errors occur.
On the latest R32.5.0 and going back all the way to R32.4.3, when a camera error occurs, while two or more cameras are streaming, nvargus-daemon fails to recover by cleaning up the connections. If there is only a single camera streaming when errors occur nvargus-daemon will Segmentation fault and
systemd will automatically restart nvargus-daemon and make it useful again. However, this does not happen when two or more cameras are streaming.
Here is a link to an even older topic where camera errors bring down the entire video pipeline and make it unusable:
Please note this topic is not about how to fix the camera streaming errors. This topic is about how to make nvargus-daemon robust enough to recover the video pipeline when errors do occur (and they will in a real system).
In order to reproduce the issue all that is required is to stream with two cameras using Gstreamer or any other application. Once the cameras are streaming force an error condition through any of the following methods:
- Physically disconnect camera (i.e., SerDes, FPD-Link, GMSL, CSI ribbon cable, etc.)
- Disconnect camera power (i.e., i2c write or GPIO)
- Reset image sensor
- Inject error(s) on the CSI packets
- Change the image sensor clock
- Change the image sensor (CSI transmitter) settings to cause an error
- Et cetera
Any one of these methods should work to cause nvargus-daemon to deadlock and become unusable until it is manually restarted. Again, these are artificial methods to reproduce the issue with nvargus-daemon and the real use case is ESD, electrical noise, vibration, shock, and cosmic rays which will be difficult for you to reproduce.