V4L2 DMA_BUF driver by pcie

Hi support,

I have developed a pcie v4l2 driver support V4L2_MEMORY_MMAP feature,and can capture video from pcie(FPGA). I have disable pcie MMU and reserved specific address used by pcie in dts file.
Base on tegre_multimedia_api sample 12_camera_v4l2_cuda, I used memcopy API copy camera data to nvbuffer then did format convert and render. That’s work fine.
But I don’t think memcopy is a high efficiency way,especially high resolution and fps camera. Currently, our camera was 1280x1024@60fps.

I noticed example code 12_camera_v4l2_cuda was base on V4L2_MEMORY_DMABUF, and nvidia vi4 driver support “VB2_MMAP | VB2_DMABUF | VB2_READ | VB2_USERPTR” feature. I have changed my driver to support VB2_DMABUF reference from "drivers/media/pci/dt3155/dt3155.c "

When I used 12_camera_v4l2_cuda sample to capture video,kernel show this err “contiguous chunk is too small 69632/4194304 b”. ./camera_v4l2_cuda -d /dev/video2 -s 2048x1024 -f 60 -f YVYU

I confused about nvbuffer. nvbuffer should be physical address continuity. And use v4l2 qbuf API transmit DMABUF fd to driver.what caused contigous chunk is too small?

Is there any way to get NvBuffer physial address directly? If yes,I can transmit this address to FPGA DMA.
dt3155.c (18.5 KB)

hello Donel,

seems like you would like to access the video buffer for CUDA processing, please refer to
[NVIDIA Tegra Linux Driver Package]-> [Development Guide]-> [Related Documentation]-> [Accelerated GStreamer User Guide],
please check the [CUDA VIDEO POST-PROCESSING WITH GSTREAMER1.0] chapter,
here’s gst-nvivafilter plugin for you to perform CUDA processing directly.

Hi Jerrychang,
I want to implement video data from pcie to NVBUFFER without memcopy. The same as tegre_mutlimedia_api/sample/12_camera_v4l2_cuda.

hello Donel,

please use the gst-nvivafilter, it allocate NvBuffer directly.

In my opinion,when application software use V4L2_DMABUF flag like 12_camera_v4l2_cuda,just need transmit DMABUF file descriptor to v4l2 capture driver. But I noiticed there still use dma_alloc_coherent to alloc physical address in vi4_fops.c.
Can you explain this?

int vi4_channel_start_streaming(struct vb2_queue *vq, u32 count)
struct tegra_channel *chan = vb2_get_drv_priv(vq);
struct media_pipeline *pipe = chan->video.entity.pipe;
int ret = 0, i;
unsigned long flags;
struct v4l2_ctrl *override_ctrl;
struct v4l2_subdev *sd;
struct device_node *node;
struct sensor_mode_properties *sensor_mode;
struct camera_common_data *s_data;
unsigned int emb_buf_size = 0;

ret = media_entity_pipeline_start(&chan->video.entity, pipe);
if (ret < 0)
	goto error_pipeline_start;

if (chan->bypass) {
	ret = tegra_channel_set_stream(chan, true);
	if (ret < 0)
		goto error_set_stream;
	return ret;


spin_lock_irqsave(&chan->capture_state_lock, flags);
chan->capture_state = CAPTURE_IDLE;
spin_unlock_irqrestore(&chan->capture_state_lock, flags);

if (!chan->pg_mode) {
	sd = chan->subdev_on_csi;
	node = sd->dev->of_node;
	s_data = to_camera_common_data(sd->dev);

	if (s_data == NULL) {
			"Camera common data missing!\n");
		return -EINVAL;

	/* get sensor properties from DT */
	if (node != NULL) {
		int idx = s_data->mode_prop_idx;

		emb_buf_size = 0;
		if (idx < s_data->sensor_props.num_modes) {
			sensor_mode =

			chan->embedded_data_width =
			chan->embedded_data_height =
			/* rounding up to page size */
			emb_buf_size =
				round_up(chan->embedded_data_width *
					chan->embedded_data_height *

	/* Allocate buffer for Embedded Data if need to*/
	if (emb_buf_size > chan->vi->emb_buf_size) {
		 * if old buffer is smaller than what we need,
		 * release the old buffer and re-allocate a bigger
		 * one below
		if (chan->vi->emb_buf_size > 0) {
			chan->vi->emb_buf_size = 0;

		chan->vi->emb_buf_addr =
				&chan->vi->emb_buf, GFP_KERNEL);[/b]
		if (!chan->vi->emb_buf_addr) {
					"Can't allocate memory for embedded data\n");
			goto error_capture_setup;
		chan->vi->emb_buf_size = emb_buf_size;

My pcie v4l2 driver is based on vb2 driver,and support VB2_DMABUF. I think this should implement video date to nvbuffer without memcopy.

ok, I will try it.

Hi JerryChang,
Sorry for late replay.
I think gst-nvivafilter is not waht we expected. We developed v4l2 pcie driver base on vb2.And we want to use sample NO.12 in tegra_multimedia_api capture video by pcie and send it to nvbuffer to process. But when we capture video, we got “contigous chunk is too small” error message.

Why don’t mipi v4l2 driver that base on vb2 have this problem?


I am also getting the “contiguous chunk is too small” error from videobuf2-dma-contig.c whilst trying to use vb2 buffers in my PCIe driver.

Did you ever get anywhere with this?



Hi Donel,

I am working on the similar usecase, i am refering to the driver/media/pci/dt3155 v4l2-pcie driver from nvidia kernel source & 12_camera_v4l_cuda sample.

I want to read the frames from a FPGA ( connected to two Cameras), via the v4l-pcie driver and get the data in Nvbuffers making it available to the Hardware accelerated gstreamer plugins such as nvidia encoders etc.

We are expecting 4K frames @ 60 fps, so for better performance we dont need a memcpy.
Have you been successfull in doing this transfer.

Using DMA_BUF in vb2 based v4l driver.

It would be great if you could give me pointers to this work.