Is concurrent copy and execute possible when v4l2 to dequeue capture buffers?

I would like to capture buffers off my webcam and process them concurrently.

The problem is the V4L2 interface operates on IOCTLs that obtain a full frame of buffer at at time. I would like to do this:

Is it possible to somehow perform buffer dequeue on parts of an image so that once those buffers have been obtained from the webcam, they can be processed while the remaining parts are being transferred?
Hope this makes sense.

We would suggest use NvBuffer APIs to copy to another buffer. It leverages hardware VIC engine and is fast. The usecase is similar to 12_camera_v4l2_cuda. Please install through SDKManager and check
For more information about tegra_multimedia_api, please look at

Hi, that is the example I am basing my code on. :-)

It is fast, but when you use V4L2 it can only dequeue a WHOLE buffer at a time. I would like to request “dequeue the first 50% of the buffer, then I will pass that to CUDA while you dequeue the second 50% of the buffer”.

Is this possible?

We may not support the case. The frames captured through v4l2 are complete frames.

Thanks for your reply. Sounds fair. I think this would involve a rewrite of the V4L2 driver by Nvidia.