Camera crashes nvargus-daemon

I am streaming video using GStreamer and nvarguscamerasrc from an IMX219 Raspberry Pi camera on a Jetson Nano with a custom carrier board running L4T 32.5.0.

Whenever the camera cable is moved, it is possible to crash the video stream and nvargus-daemon. Immediately prior to the crash, horizontal streaks appear in single-rows on the image. The camera device node (/dev/video0) then disappears, and it’s impossible to recover the video stream without rebooting the board.

Gstreamer pipeline

#Server
rtsp-streamer "nvarguscamerasrc ! video/x-raw(memory:NVMM),format=NV12,width=1280,height=720,framerate=60/1 ! queue ! nvv4l2h264enc num-B-Frames=0 maxperf-enable=true ! queue ! rtph264pay config-interval=1 pt=96 name=pay0"

#Receiver
gst-launch-1.0 rtspsrc location=rtsp://10.1.1.10:8554/test latency=0 ! queue ! parsebin ! queue ! decodebin ! queue ! autovideosink
root@jetson:~# journalctl -f -b -u nvargus-daemon
-- Logs begin at Sun 2020-09-20 10:43:58 UTC. --
Sep 20 10:44:52 jetson systemd[1]: Started NVIDIA Argus daemon.
Sep 20 10:45:15 jetson nvargus-daemon[4006]: === NVIDIA Libargus Camera Service (0.97.3)=== Listening for connections...=== rtsp-streamer[4064]: Connection established (7F744BD1C0)OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module0
Sep 20 10:45:15 jetson nvargus-daemon[4006]: OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module1
Sep 20 10:45:15 jetson nvargus-daemon[4006]: OFParserGetVirtualDevice: NVIDIA Camera virtual enumerator not found in proc device-tree
Sep 20 10:45:15 jetson nvargus-daemon[4006]: ---- imager: Found override file [/var/nvidia/nvcam/settings/camera_overrides.isp]. ----
Sep 20 10:45:15 jetson nvargus-daemon[4006]: CAM: serial no file already exists, skips storing againLSC: LSC surface is not based on full res!
Sep 20 10:45:15 jetson nvargus-daemon[4006]: ---- imager: Found override file [/var/nvidia/nvcam/settings/camera_overrides.isp]. ----
Sep 20 10:45:15 jetson nvargus-daemon[4006]: (NvCamV4l2) Error ModuleNotPresent: V4L2Device not available (in /dvs/git/dirty/git-master_linux/camera/utils/nvcamv4l2/v4l2_device.cpp, function findDevice(), line 256)
Sep 20 10:45:15 jetson nvargus-daemon[4006]: (NvCamV4l2) Error ModuleNotPresent:  (propagating from /dvs/git/dirty/git-master_linux/camera/utils/nvcamv4l2/v4l2_device.cpp, function initialize(), line 60)
Sep 20 10:45:15 jetson nvargus-daemon[4006]: (NvOdmDevice) Error ModuleNotPresent:  (propagating from dvs/git/dirty/git-master_linux/camera-partner/imager/src/devices/V4L2SensorViCsi.cpp, function initialize(), line 107)
Sep 20 10:45:15 jetson nvargus-daemon[4006]: NvPclDriverInitializeData: Unable to initialize driver v4l2_sensor
Sep 20 10:45:15 jetson nvargus-daemon[4006]: NvPclInitializeDrivers: error: Failed to init camera sub module v4l2_sensor
Sep 20 10:45:15 jetson nvargus-daemon[4006]: NvPclStartPlatformDrivers: Failed to start module drivers
Sep 20 10:45:15 jetson nvargus-daemon[4006]: NvPclStateControllerOpen: Failed ImagerGUID 0. (error 0xA000E)
Sep 20 10:45:15 jetson nvargus-daemon[4006]: NvPclOpen: PCL Open Failed. Error: 0xf
Sep 20 10:45:15 jetson nvargus-daemon[4006]: SCF: Error BadParameter: Sensor could not be opened. (in src/services/capture/CaptureServiceDeviceSensor.cpp, function getSourceFromGuid(), line 582)
Sep 20 10:45:15 jetson nvargus-daemon[4006]: SCF: Error BadParameter:  (propagating from src/services/capture/CaptureService.cpp, function addSourceByGuid(), line 437)
Sep 20 10:45:15 jetson nvargus-daemon[4006]: SCF: Error BadParameter:  (propagating from src/api/CameraDriver.cpp, function addSourceByIndex(), line 303)
Sep 20 10:45:15 jetson nvargus-daemon[4006]: SCF: Error BadParameter:  (propagating from src/api/CameraDriver.cpp, function getSource(), line 466)
Sep 20 10:45:15 jetson nvargus-daemon[4006]: Acquiring SCF Camera device source via index 1 has failed. === rtsp-streamer[4064]: CameraProvider initialized (0x7f6cbbac20)SCF: Error BadValue: NvPHSSendThroughputHints (in src/common/CameraPowerHint.cpp, function sendCameraPowerHint(), line 56)
Sep 20 10:45:15 jetson nvargus-daemon[4006]: CAM: serial no file already exists, skips storing againLSC: LSC surface is not based on full res!
Sep 20 10:45:15 jetson nvargus-daemon[4006]: NvIspAfConfigParamsSanityCheck: Error: positionWorkingHigh is not larger than positionWorkingLow positionWorkingHigh = 0, positionWorkingLow = 0
Sep 20 10:45:15 jetson nvargus-daemon[4006]: NvIspAfConfigParamsSanityCheck: Error: positionWorkingHigh is not larger than positionWorkingLow positionWorkingHigh = 0, positionWorkingLow = 0
Sep 20 10:45:15 jetson nvargus-daemon[4006]: NvIspAfConfigParamsSanityCheck: Error: positionWorkingHigh is not larger than positionWorkingLow positionWorkingHigh = 0, positionWorkingLow = 0
Sep 20 10:45:27 jetson nvargus-daemon[4006]: Error: waitCsiFrameStart timeout guid 1
Sep 20 10:45:27 jetson nvargus-daemon[4006]: ************VI/CSI Debug Registers**********
Sep 20 10:45:27 jetson nvargus-daemon[4006]: VI_CFG_INTERRUPT_MASK_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: VI_CFG_INTERRUPT_STATUS_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: VI_CSI_0_ERROR_STATUS_0 = 0x00000002
Sep 20 10:45:27 jetson nvargus-daemon[4006]: VI_CSI_0_ERROR_INT_MASK_0 = 0x0000001f
Sep 20 10:45:27 jetson nvargus-daemon[4006]: VI_CSI_1_ERROR_STATUS_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: VI_CSI_1_ERROR_INT_MASK_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_CSI_PIXEL_PARSER_A_INTERRUPT_MASK_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_CSI_PIXEL_PARSER_A_STATUS_0 = 0x00004095
Sep 20 10:45:27 jetson nvargus-daemon[4006]: SCF: Error Timeout: ISP port 0 timed out! (in src/services/capture/NvIspHw.cpp, function waitIspFrameEnd(), line 478)
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_CSI_PIXEL_PARSER_B_INTERRUPT_MASK_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: SCF: Error Timeout:  (propagating from src/services/capture/NvIspHw.cpp, function waitIspFrameEnd(), line 524)
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_CSI_PIXEL_PARSER_B_STATUS_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_CSI_CIL_A_INTERRUPT_MASK_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_CSI_CIL_A_STATUS_0 = 0x00000003
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_CSI_CILA_STATUS_0 = 0x00030010
Sep 20 10:45:27 jetson nvargus-daemon[4006]: SCF: Error Timeout:  (propagating from src/common/Utils.cpp, function workerThread(), line 116)
Sep 20 10:45:27 jetson nvargus-daemon[4006]: SCF: Error Timeout: Worker thread IspHw frameComplete failed (in src/common/Utils.cpp, function workerThread(), line 133)
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_CSI_CIL_B_INTERRUPT_MASK_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_CSI_CIL_B_STATUS_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_CSI_CILB_STATUS_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: SCF: Error Timeout: ISP Stats timed out! (in src/services/capture/NvIspHw.cpp, function waitIspStatsFinished(), line 566)
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_CSI_READONLY_STATUS_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: SCF: Error Timeout: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 992)
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_DEBUG_CONTROL_0 = 0x75257300
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_DEBUG_COUNTER_0_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_DEBUG_COUNTER_1_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: CSI_DEBUG_COUNTER_2_0 = 0x00000000
Sep 20 10:45:27 jetson nvargus-daemon[4006]: *****************************************
Sep 20 10:45:29 jetson nvargus-daemon[4006]: Error: waitCsiFrameStart timeout guid 1
Sep 20 10:45:29 jetson nvargus-daemon[4006]: ************VI/CSI Debug Registers**********
Sep 20 10:45:29 jetson nvargus-daemon[4006]: VI_CFG_INTERRUPT_MASK_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: VI_CFG_INTERRUPT_STATUS_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: VI_CSI_0_ERROR_STATUS_0 = 0x00000002
Sep 20 10:45:29 jetson nvargus-daemon[4006]: VI_CSI_0_ERROR_INT_MASK_0 = 0x0000001f
Sep 20 10:45:29 jetson nvargus-daemon[4006]: VI_CSI_1_ERROR_STATUS_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: VI_CSI_1_ERROR_INT_MASK_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_CSI_PIXEL_PARSER_A_INTERRUPT_MASK_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_CSI_PIXEL_PARSER_A_STATUS_0 = 0x00004095
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_CSI_PIXEL_PARSER_B_INTERRUPT_MASK_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_CSI_PIXEL_PARSER_B_STATUS_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: SCF: Error Timeout: ISP Stats timed out! (in src/services/capture/NvIspHw.cpp, function waitIspStatsFinished(), line 566)
Sep 20 10:45:29 jetson nvargus-daemon[4006]: SCF: Error BadParameter: CC has already been disposed (in src/components/CaptureContainerManager.cpp, function dispose(), line 161)
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_CSI_CIL_A_INTERRUPT_MASK_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_CSI_CIL_A_STATUS_0 = 0x00000003
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_CSI_CILA_STATUS_0 = 0x00030010
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_CSI_CIL_B_INTERRUPT_MASK_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_CSI_CIL_B_STATUS_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_CSI_CILB_STATUS_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_CSI_READONLY_STATUS_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_DEBUG_CONTROL_0 = 0x75257300
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_DEBUG_COUNTER_0_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_DEBUG_COUNTER_1_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: CSI_DEBUG_COUNTER_2_0 = 0x00000000
Sep 20 10:45:29 jetson nvargus-daemon[4006]: *****************************************
Sep 20 10:45:29 jetson systemd[1]: nvargus-daemon.service: Main process exited, code=killed, status=11/SEGV
Sep 20 10:45:29 jetson systemd[1]: nvargus-daemon.service: Failed with result 'signal'.
Sep 20 10:45:29 jetson systemd[1]: nvargus-daemon.service: Scheduled restart job, restart counter is at 1.
Sep 20 10:45:29 jetson systemd[1]: Stopped NVIDIA Argus daemon.
Sep 20 10:45:29 jetson systemd[1]: Started NVIDIA Argus daemon.
root@jetson:~# journalctl -f -b -u camera-server-rtsp
-- Logs begin at Sun 2020-09-20 10:43:58 UTC. --
Sep 20 10:45:29 jetson camera-server-rtsp.sh[4064]: (Argus) Error InvalidState: Argus client is exiting with 1 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 357)
Sep 20 10:45:29 jetson camera-server-rtsp.sh[4064]: (Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 368)
Sep 20 10:45:29 jetson camera-server-rtsp.sh[4064]: (Argus) Error EndOfFile: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
Sep 20 10:45:29 jetson camera-server-rtsp.sh[4064]: (Argus) Error EndOfFile:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 87)
Sep 20 10:45:29 jetson camera-server-rtsp.sh[4064]: (Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
Sep 20 10:45:29 jetson camera-server-rtsp.sh[4064]: (Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 87)
Sep 20 10:45:29 jetson camera-server-rtsp.sh[4064]: (Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
Sep 20 10:45:29 jetson camera-server-rtsp.sh[4064]: (Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 87)
Sep 20 10:45:29 jetson camera-server-rtsp.sh[4064]: (Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
Sep 20 10:45:29 jetson camera-server-rtsp.sh[4064]: (Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 87)

This problem occurs when using the standard 150mm flat-flex camera cable supplied with the camera module. I have also tried longer cables (300mm, 600mm) which appear to be more sensitive to the problem. If the cable is left untouched then the stream is stable and no corrupt image data comes through. The behaviour can also be reproduced by simply unplugging the cable. It’s even possible to manipulate the cable in such a way that the signal integrity is borderline (lots of horizontal lines appear) and to maintain this state indefinitely. I can then crash the daemon at will by further bending the cable.

I tried debugging this myself but the crash is inside nvargus-daemon, which is closed source (please release the sources!)

This connection needs to be tolerant to slight movements and other environmental interference for a robust deployment in production environments. It’s not acceptable that slight interference can bring down the entire system.

I would suggest gluing the connector and cable together with caulking.

In my first autonomous rc car, the camera would freeze due to the vibration caused when driving on asphalt, but after gluing, the camera has not frozen even when crashing into a wall. The camera continues to work comfortably.

Thank you for the suggestion @naisy,

Unfortunately the connector does need to be able to unplug for maintenance access in my application. I’m also sure that the interference is not caused by an open-circuit, but rather from external interference - be it noise injected by proximity to other circuits, or changes to the characteristic impedance of the cable when its geometry is changed.

I’d like someone from NVIDIA to look into this and provide a patch for nvargus-daemon, as it should never be the case that a program crashes - especially a system service. Any error should be gracefully handled and recovered from. Improvements to robustness from external measures (such as your suggestion) are good practice but should be applied in conjunction with hardening of the software to unexpected conditions.

I would recommend silicone caulking so it can be removed for maintenance.

I used 200cm flexible cable and have never encountered issues you are concerned about.
DC motors always generate spark noise, but to that extent they have no effect on the camera.

As you say, it would be preferable for the daemon to be stable.
However, your environment means that it is not a problem if the camera is not available.
For operation in harsh environments, I would recommend using a camera system that is suited for that.

@naisy I tried your suggestion of gluing the the cable into the connector at both ends (at the carrier board and at the camera module) and the problem did not go away, confirming that this is not caused by the cable just coming unplugged (though unplugging the cable is one method to replicate the problem).

I appreciate the suggestion however I’d really like some response from an NVIDIA representative.

It’s not sufficient to suggest “the camera cable must be suitable” - how does one quantify this? Consider what would an Engineer do to source a suitable commercial off-the-shelf solution, or what would they do to design something fit for their specific application requirements. Is it a matter of having perfectly matched impedance according to the MIPI standard? Is the camera port even compliant with the MIPI standard? This is an aspect of the documentation that is severely lacking (it’s not even mentioned in the camera design guide). In the absence of all this information the system should be robust enough that all of these considerations go away and picking something “close enough” is likely to result in a successful outcome.

I would suggest to use GMSL camera as long distance use case.

@ShaneCCC I have a future application in which I intend to take exactly this approach, which I think is the industry standard. Unfortunately, even if I had the resources to change the current design today, this would not fix my problem.

image
The SerDes has MIPI CSI on both ends of the link. What happens when there is some noise injected in this CSI signal? The data going into the ISP is still corrupt and as I’ve demonstrated, the nvargus-daemon will crash.

The only acceptable solution is going to be some additional error handling in the daemon to prevent the crash from happening when unexpected data is encountered.

Current Nano don’t support error recovery.

Thanks

@ShaneCCC are you saying that this is a hardware error and it is incapable of being detected or resolved by the software?

I find it hard to believe that the software couldn’t at least fail in a sane manner (not emitting a segmentation violation would be a good start). If this is hardware problem it should be documented in the silicon errata.

Does the NX suffer from this same limitation?

NX have implement the error recovery from r32.7.x not sure r32.6.1 have complete this feature or not.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.