hi,
we’re using NVIDIA DRIVE AGX Xavier-AD (Quanta) with 4x “IMX490 Rev 8 RGGB” (5.4MP) cameras over GMSL2 link, in automotive environment.
we’re facing a problem, where sometimes we start seeing following logs:
../gst-libs/gst/nvmedia/sipl/nvsiplnotificationhandler.cpp LogEvent:67: Notification handler: ICP capture timeout.
../gst-libs/gst/nvmedia/sipl/nvsiplnotificationhandler.cpp LogEvent:67: Notification handler: ICP capture timeout.
...
multiple times, followed by a warning from our code about callbacks not being called (suggesting we’re stuck on reading frame).
within about 2s from that we can see kernel logs about USB devices being disconnected.
then after ~10s we see another problem reported:
Error received from element nvmediasiplsrc: Internal data stream error.. Debug info: ../subprojects/gstreamer/libs/gst/base/gstbasesrc.c(3177): gst_base_src_loop (): /GstPipeline:camera-pipeline-port-b/GstBin:nvidia_vide o_source_back/GstNvMediaSiplSrc:nvmediasiplsrc:
streaming stopped, reason error (-5)
...
4 times in a row (1 per camera?), followed by more “Notification handler: ICP capture timeout” messages.
then about 20-30s later, we can see de-initialization code failing:
Module_id 23 Severity 2 : Failed to close isp channel
Module_id 23 Severity 2 : NvMediaISPDestroy: Failed to close isp channel
camera_node: /dvs/git/dirty/git-master_linux/camera/fusa/sipl/src/core/pipelineMgr/spmgr/pipeline/blocks/CNvMISPWrapperDefault.hpp: 232: operator(): Couldn't destroy NvMediaISP handle. nvmStatus:7
again - 4 times, but this time in a few seconds intervals.
after that - neither USB nor cameras are usable again.
we do not believe this is a cabling issue, as:
- power cycling restores the system to a healthy state.
- in all reproductions both USB and cameras (GMSL) go off at almost exactly the same time (under 2s difference).
- “regular” USB disconnections and reconnection events are handled correctly and without cameras loss.
we have issues reproducing this problem in the lab, however we can observe it occasionally in the wild.
is this a know problem? if not - how can it be debugged? if yes - how can it be addressed / mitigated?