V4L2 Buffer Errors & Initialization Bug with Multi-Camera Sync on Jetson

Hello everyone,

We are facing a persistent issue with a multi-camera setup on a Jetson platform and are hoping to get some insights from the community, as we suspect the problem might be at a lower level than the camera-specific ROS drivers. Our Setup:

  • Platform: Aetina Orin NX 16GB, AIB-SN41-1-A1

  • JetPack / L4T: 5.1.2 (R35.4.1)

  • Cameras: 2x e-con Systems e-CAM25_CUONX (AR0234 sensor)

  • Software: ROS1 Noetic

  • Goal: Synchronize two cameras using an external hardware trigger and timestamps in ROS.

The Core Problem: V4L2 Buffer Errors
When we run our ROS nodes to capture and synchronize the two camera streams, one of the streams will inevitably fail after a short period. We have been in contact with the camera manufacturer, and they’ve suggested this is a V4L2 buffer dequeue error. We have verified our trigger signal is clean and stable.

Key Finding: A Reproducible Initialization Order Bug
While debugging, we discovered a strict initialization order dependency. We suspect this may be related to how entity priority is handled in the NVCSI controller.

  • Working Order: If we initialize Camera A first, and then Camera B second, both streams work perfectly.

  • Failing Order: If we initialize Camera B first, then Camera A fails to start.

Summary & Question for the Community
This initialization bug makes us believe the root cause isn’t just in the ROS application layer but potentially deeper in the stack. The fact that the order in which we access nodes determines if a device works or not points towards a potential resource conflict, a race condition, or an issue in how the V4L2 kernel drivers or the underlying NVIDIA Argus stack handle multiple CSI devices being activated in quick succession.

  1. Has anyone in the community encountered similar V4L2 buffer errors or strange initialization dependencies with multi-camera setups on Jetson platforms?

  2. Could this behavior be related to the NVIDIA V4L2 drivers or the Argus daemon’s management of the CSI ports?

  3. Are there any recommended debugging steps at the dmesg or kernel level that could help us trace why one camera fails to initialize based on this sequence?

Any advice or insights would be greatly appreciated!

2025-08-22_20-56-56_dmesg.log (176.7 KB)

  1. Can these sensor running in unsync mode? Try if unsync mode.
  2. Set the timeout to infinite by modify the capture timeout to -1 in vi5_fops.c

Thank you for the response.

  1. We tested the cameras in freerun mode, and they work without any buffer errors. However, our application requires precise multi-camera synchronization, which is why we must use the external hardware trigger.
  2. We already tested this modification during our initial troubleshooting, and it did not resolve the buffer errors.

Let me know if you need more information.

Hi,

As a suggestion, have you tried reproducing the problem using v4l2-ctl?

In this way we can make sure that the problem is not related to your application. I could try the following command:

v4l2-ctl -d /dev/videoX --set-ctrl bypass_mode=0 --stream-mmap --stream-count=500

Enrique Ramirez
Embedded SW Engineer at RidgeRun
Contact us: support@ridgerun.com
Developers wiki: https://developer.ridgerun.com
Website: www.ridgerun.com

What’s the error?

Thank you for the helpful suggestion.

I tested both initialization orders using v4l2-ctl while the cameras were running in external_trigger mode. Both cameras streamed without any errors, regardless of which one was started first.

For context, our application’s camera handling logic is based on the package e-con Systems recommended: rqt_cam (link), specifically the logic within camera.cpp (link).

These are the specific V4L2 buffer flags when the error occurs: V4L2_BUF_FLAG_ERROR and V4L2_BUF_FLAG_MAPPED. I also captured buffer index and another details: (buffer index: 0, bytesused: 1843200, sequence: 0)

Any kernel error like capture timeout?

Hi,

Here are the errors that appear when the stream fails. The full log is available in the first post of this thread for reference.

[Fri Aug 22 20:54:24 2025] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[Fri Aug 22 20:54:24 2025] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[Fri Aug 22 20:54:24 2025] t194-nvcsi 13e40000.host1x:nvcsi@15a00000: csi5_stream_close: Error in closing stream_id=2, csi_port=2
[Fri Aug 22 20:55:38 2025] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 0, flags: 0, err_data 131072

This message tell unable capture frame data from the sensor.

Maybe consult with E-CON for the solution.

Yes, it looks to be a problem with the application used.

My theory is that the external trigger may not be activated or handled correctly by the application, causing the timeout errors as no data is received from the sensor.

As ShaneCC it would be better to as E-CON, or invest time reviewing the application to find the problem.

Embedded SW Engineer at RidgeRun
Contact us: support@ridgerun.com
Developers wiki: https://developer.ridgerun.com
Website: www.ridgerun.com