Best framework for low latency "camera to CUDA"

I am just settling on a way to do my video processing and it seems like I might be using the MMAPI. There is this example for grabbing video from a camera and doing CUDA stuff with it:

https://docs.nvidia.com/jetson/l4t-multimedia/l4t_mm_v4l2_cam_cuda_group.html

In the experts’ opinion, is this going to be the best way to give me low latency CUDA processing of video frames?

i.e. I want the smallest delay possible between the real life changes happening and being able to process an image of those things in CUDA.

Thank you in advance for your assistance.

Hi,
If your camera source is a v4l2 source, 12_camera_v4l2_cuda is the bast way. If the source is Bayer sensors, you can refer to 09_camera_jpeg_capture and 10_camera_recording.

Do you think this is going to be the best way to give me low latency CUDA processing of video frames?

Hi,
Yes. Except the latency from the v4l2 source, this is optimal and zero memcpy.

Thanks.

If I need to capture from a CSI camera does that use V4L2 as well or must I use libargus?

Hi,

If your CSI camera is a Bayer sensor and requires hardware ISP engine, you must use libargus.
If the camera is with onboard ISP and outputs frames in YUV422(usch as UYVY or YUYV) format, you can capture through v4l2 interface just like 12_camera_v4l2_cuda demonstrates.

Thanks.