MMAPI's 12_camera_v4l2_cuda time-consuming question

I test mmapi sample “12_camera_v4l2_cuda”, and command is

./camera_v4l2_cuda  -d /dev/video0 -s 1920x1080 -f UYVY -r 60

I found convertvideo is time-consuming, about 10ms. It seems to be abnormal. I added some time print, and found most time-consuming is “lane->dqBuffer(v4l2_buf, &buffer, &shared_buffer, -1)” at dpThread.

void *
NvV4l2ElementPlane::dqThread(void *data)
    NvV4l2ElementPlane *plane = (NvV4l2ElementPlane *) data;
    const char *comp_name = plane->comp_name;
    const char *plane_name = plane->plane_name;

    PLANE_DEBUG_MSG("Starting DQthread");
    prctl (PR_SET_NAME, plane_name, 0, 0, 0);
    plane->stop_dqthread = false;
    while (!plane->stop_dqthread)
        struct v4l2_buffer v4l2_buf;
        struct v4l2_plane planes[MAX_PLANES];
        NvBuffer *buffer;
        NvBuffer *shared_buffer;
        bool ret;

        memset(&v4l2_buf, 0, sizeof(v4l2_buf));
        memset(planes, 0, sizeof(planes));
        v4l2_buf.m.planes = planes;
        v4l2_buf.length = plane->n_planes;

        if (plane->dqBuffer(v4l2_buf, &buffer, &shared_buffer, -1) < 0)
            if (errno != EAGAIN)
                plane->is_in_error = 1;
            if (errno != EAGAIN || plane->streamon)
                ret = plane->callback(NULL, NULL, NULL, plane->dqThread_data);
            if (!plane->streamon)
            ret = plane->callback(&v4l2_buf, buffer, shared_buffer,
        if (!ret)
    plane->stop_dqthread = false;

    plane->dqthread_running = false;
    PLANE_DEBUG_MSG("Exiting DQthread");
    return NULL;

Is it normal ? How to reduce time-consuming ?
I’ll be looking forward to your reply!

Are you using usb camera?
If usb camera, buffer copy will takes a lot of time.

Hi, waynezhu, we use video-interface through csi on our own board. We capture YUV by v4l2, and its time-consuming is normal. Only the time-consuming of videoconvert is abnormal. So, I think it has nothing to do with buffer copy.
Thank you for reply!


How do you measure time consuming for a frame?

BTW, could you let me know which release you are using? I will test on my side.

Hi, waynezhu
I measure time consuming before and after “lane->dqBuffer(v4l2_buf, &buffer, &shared_buffer, -1)” at dpThread.

I use Jetpack-3.1.

I have measure VIC’s latency, it takes only 1.5-2ms.

I think you need review your measure methods.

My method:

  1. print time when output plane qbuffer
  2. print time when capture plane dqbuffer
  3. when qbuffer, you need to add timestamp in field v4l2_buf->timestamp.tv_sec to identify a frame

Thank you for your relay, I will try your method.