Processing NvVideoDecoder output on a different thread


I’m using the multimedia API and am decoding video from a file. What I would like is to use multiple threads to do post processing on each frame, but I am having some issues on getting it to work. With the code below:

while (! m_decoder->isInError())
    v4l2_buffer v4l2_buf = {};
    v4l2_plane planes[MAX_PLANES] = { 0 };

    v4l2_buf.m.planes = planes;

    NvBuffer* decBuffer;

    int ret;

        std::lock_guard<std::mutex> lock(m_decodeMutex);
        ret = m_decoder->capture_plane.dqBuffer(v4l2_buf, &decBuffer, nullptr, 0);

    if (ret < 0)
        if (errno == EAGAIN)

        std::cerr << "Error while calling dequeue at capture plane\n";

    v4l2_buf.m.planes[0].m.fd = m_dmaBufferFds[v4l2_buf.index];

    BufferInfo bufferInfo {};

    bufferInfo.fileDescriptor = v4l2_buf.m.planes[0].m.fd;
    bufferInfo.size = v4l2_buf.m.planes[0].bytesused;

    std::thread([bufferInfo, this, v4l2_buf](v4l2_buffer&& v4l2_buf) mutable {

        if (m_callback)

        std::lock_guard<std::mutex> lock(m_decodeMutex);

        if (m_decoder->capture_plane.qBuffer(v4l2_buf, nullptr) < 0)
            std::cerr << "Error while queueing buffer at decoder capture plane\n";
    }, std::move(v4l2_buf)).detach();

It works fine if I don’t create a new thread and process everything sequentially. With the threads, I often get failures when queuing buffers

nvbuf_utils: dmabuf_fd 0 mapped entry NOT found
nvbuf_utils: Can not get HW buffer from FD... Exiting...

Am I going about this all wrong? Is there an example in the sample code? The closest I could see with in the backend code, but that is just processing multiple files on different threads, not a single file.

Any help appreciated.

It looks like you are running a more complex case than reference samples. Could you share a patch on either sample so that we can build, run to reproduce the issue and check further? If backend is close to your usecase, please share a patch on it.

It’s not really close to my use case, I attempted to create a patch to an existing sample, but nothing is actually that close to what I am doing.

I guess my question is at a general level: Is this a supported use case from the multimedia api - Dequeuing a buffer from one thread, passing that buffer to another thread to have transforms and the like applied to it, and then Queuing it back?

Are the methods on NvV4l2ElementPlane thread safe?


Ok, I created a patch from the 00_video_decode sample. I’ll upload it. I run it with

./video_decode MPEG4 -ww 500 -wh 250 --blocking-mode 0 <video>

I discovered if I add a sleep in the main thread before dequeuing the buffer, everything seems to run fine.

Obviously in my real code I wouldn’t be creating so many threads, I’m just trying to keep the changes simple.
video_decode_main.cpp (67.1 KB)
add_threads_patch.txt (10.4 KB)

Is the issue hit on r32.1 or r28.3?


It looks to have a bug in add_threads_patch.txt. Some variables are concurrently accessed by original thread and new created threads:

struct v4l2_buffer v4l2_capture_buf;
struct v4l2_plane capture_planes[MAX_PLANES];

NvBuffer *capture_buffer = NULL;

Please try to declare local variables in new created threads and ensure values are copied to local variables before calling next dqBuffer().