VIDIOC_DQBUF blocks

Hi Chris,
Please get in contact with NVIDIA salesperson so that we can review and prioritize this request.

Did you find any solution to your problem. I am having similar issue.

This is what I am doing. I am grabbing h264 encoded frames from an IP camera, decomposing the encoded buffer by myself to get SPS, PPS and frame buffers. Then I am packing with each of these with starting prefix 0x00,0x00,0x00,0x01 and feeding this to the decoder as NAL units. I am initializing output_plane as

ctx[channel].dec->output_plane.setupPlane(V4L2_MEMORY_MMAP, 10, true, false);

But when all 10 output_plane buffer are en queued and the program starts to dequeue first before putting more frames into the queue, it stucks in

ret = v4l2_ioctl(fd, VIDIOC_DQBUF, &v4l2_buf);

of

int
NvV4l2ElementPlane::dqBuffer(struct v4l2_buffer &v4l2_buf, NvBuffer ** buffer,
NvBuffer ** shared_buffer, uint32_t num_retries)

function.

I tried using or not using
ret = ctx[channel].dec->disableCompleteFrameInputBuffer();

but no luck. It stuck always. difference of using disableCompleteFrameInputBuffer() is the v4l2_ioctl() function succeeds upto 3rd frame (i.e., SPS, PPS, iFrame), without disableCompleteFrameInputBuffer() it stuck after dequeuing only SPS and PPS.

Need help.

BTW, for your info, my program at the moment is only feeding h264 encoded frames into the decoder. There is no code yet to read and use the decoded buffer.

More info:
This is what I get in my qt application output window, if it helps:

NvMMLiteOpen : Block : BlockType = 261
TVMR: NvMMLiteTVMRDecBlockOpen: 7647: NvMMLiteBlockOpen
NvMMLiteBlockCreate : Block : BlockType = 261
TVMR: cbBeginSequence: 1179: BeginSequence 1280x720, bVPR = 0
TVMR: LowCorner Frequency = 0
TVMR: cbBeginSequence: 1529: DecodeBuffers = 6, pnvsi->eCodec = 4, codec = 0
TVMR: cbBeginSequence: 1600: Display Resolution : (1280x720)
TVMR: cbBeginSequence: 1601: Display Aspect Ratio : (1280x720)
TVMR: cbBeginSequence: 1669: ColorFormat : 5
TVMR: cbBeginSequence:1680 ColorSpace = NvColorSpace_YCbCr709
TVMR: cbBeginSequence: 1809: SurfaceLayout = 3
TVMR: cbBeginSequence: 1902: NumOfSurfaces = 13, InteraceStream = 0, InterlaceEnabled = 0, bSecure = 0, MVC = 0 Semiplanar = 1, bReinit = 1, BitDepthForSurface = 8 LumaBitDepth = 8, ChromaBitDepth = 8, ChromaFormat = 5
TVMR: cbBeginSequence: 1904: BeginSequence ColorPrimaries = 1, TransferCharacteristics = 1, MatrixCoefficients = 1

Hi caruofc,

Try not to separate the SPS/PPS from the actual payload. try to send them in a single payload like this:

0001SPS0001PPS0001<REAL_NALU…>

Also, make sure you count the enqueue and dequeue operations. Otherwise it will block.

Lastly, I do not use the sample from NVidia. I find it full of work-arounds. I’m better off investing time understanding every single work around rather than invest same time or more into that sample code which is very far from being a usable SDK-like sample.

Thanks shiretuxiv2i for you quick reply.
I did try sending SPS/PPS in one payload as you described at first. That did not work either.
Here is my function which basically fill the output_plane of the decoder and once the output buffers are all queued it starts to dequeue and enqueue new encoded nal payload as it comes.
There is no eos whatsoever as its a continuous stream coming from IP camera once connected.

What happens is, first dqBuffer() operation succeeds following by qBuffer() but next time when next payload comes the dqBuffer() blocks when it tries to execute v4l2_ioctl() after experiencing EAGAIN.

bool VideoDecoder_Tegra::decode(const char* data, int len)
{
context_t* aCtx = &ctx[channel];

if (aCtx->got_error || aCtx->dec->isInError())
    return false;

int ret;
struct v4l2_buffer v4l2_buf;
struct v4l2_plane planes[MAX_PLANES];
NvBuffer *buffer;

memset(&v4l2_buf, 0, sizeof(v4l2_buf));
memset(planes, 0, sizeof(planes));

if (frame_counter<(int)aCtx->dec->output_plane.getNumBuffers()){
    buffer = aCtx->dec->output_plane.getNthBuffer(frame_counter);
    v4l2_buf.index = frame_counter;
    v4l2_buf.m.planes = planes;
    frame_counter++;
}else{
    v4l2_buf.m.planes = planes;
    ret = aCtx->dec->output_plane.dqBuffer(v4l2_buf, &buffer, NULL, -1);
    if (ret < 0)
    {
        qDebug() << "Error DQing buffer at output plane for channel " << channel;
        aCtx->got_error = true;
        return false;
    }
}

// Length is the size of the buffer in bytes
memcpy(buffer->planes[0].data, data, len);
buffer->planes[0].bytesused = len;

v4l2_buf.m.planes[0].bytesused = buffer->planes[0].bytesused;
// It is necessary to queue an empty buffer to signal EOS to the decoder
// i.e. set v4l2_buf.m.planes[0].bytesused = 0 and queue the buffer
ret = aCtx->dec->output_plane.qBuffer(v4l2_buf, NULL);
if (ret < 0)
{
    qDebug() << "Error Qing buffer at output plane for channel " << channel;
    aCtx->got_error = true;
    return false;
}

return true;

}

When I ran 00_video_decode I get the following error although it could decode successfully.

“Failed to query video capabilities: Inappropriate ioctl for device”

Does it ring the bell?

Here is the complete log:

tegra_multimedia_api/samples/00_video_decode$ ./video_decode H264 -o output.dec
sample_outdoor_car_1080p_10fps.h264
Set governor to performance before enabling profiler
Failed to query video capabilities: Inappropriate ioctl for device
NvMMLiteOpen : Block : BlockType = 261
TVMR: NvMMLiteTVMRDecBlockOpen: 7647: NvMMLiteBlockOpen
NvMMLiteBlockCreate : Block : BlockType = 261
Starting decoder capture loop thread
TVMR: cbBeginSequence: 1179: BeginSequence 1920x1088, bVPR = 0
TVMR: LowCorner Frequency = 0
TVMR: cbBeginSequence: 1529: DecodeBuffers = 5, pnvsi->eCodec = 4, codec = 0
TVMR: cbBeginSequence: 1600: Display Resolution : (1920x1080)
TVMR: cbBeginSequence: 1601: Display Aspect Ratio : (1920x1080)
TVMR: cbBeginSequence: 1669: ColorFormat : 5
TVMR: cbBeginSequence:1683 ColorSpace = NvColorSpace_YCbCr601
TVMR: cbBeginSequence: 1809: SurfaceLayout = 3
TVMR: cbBeginSequence: 1902: NumOfSurfaces = 12, InteraceStream = 0, InterlaceEnabled = 0, bSecure = 0, MVC = 0 Semiplanar = 1, bReinit = 1, BitDepthForSurface = 8 LumaBitDepth = 8, ChromaBitDepth = 8, ChromaFormat = 5
TVMR: cbBeginSequence: 1904: BeginSequence ColorPrimaries = 2, TransferCharacteristics = 2, MatrixCoefficients = 2
Video Resolution: 1920x1080
[INFO] (NvEglRenderer.cpp:109) Setting Screen width 1920 height 1080
Query and set capture successful
Input file read complete
TVMR: NvMMLiteTVMRDecDoWork: 6531: NVMMLITE_TVMR: EOS detected
TVMR: FrameRate = 10.000000
TVMR: FrameRate = 10.000000
TVMR: FrameRate = 10.000000
TVMR: FrameRate = 10.000000
TVMR: TVMRBufferProcessing: 5486: Processing of EOS
TVMR: TVMRBufferProcessing: 5563: Processing of EOS Done
Exiting decoder capture loop thread
TVMR: TVMRFrameStatusReporting: 6132: Closing TVMR Frame Status Thread -------------
TVMR: TVMRVPRFloorSizeSettingThread: 5942: Closing TVMRVPRFloorSizeSettingThread -------------
TVMR: TVMRFrameDelivery: 5982: Closing TVMR Frame Delivery Thread -------------
TVMR: NvMMLiteTVMRDecBlockClose: 7815: Done
App run was successful

That’s why I’ve said that I’m using a counter to properly control when i do dequeue. I’m no longer leaving this in the hands of V4L2, because it will block. I always keep one frame pending for dequeue (never call dequeue if I know that the next dequeue will exhaust the frames)

As for the " Inappropriate ioctl for device", that’s new. Never seen that in my case at least.

Thanks shiretuxiv2i for your reply.
I found the issue.

Seems if you don’t run capture loop the decoder does not accept anymore input into its output plane. Wired. I did not expect that. Why feeding loop is dependent on capture loop is beyond my understanding.

Anyway, do you know whats the difference between defining or not defining “USE_NVBUF_TRANSFORM_API” in decoder sample code?

Digging into video_dec_cuda sample it seems NVBUF_TRANSFORM_API does not use CUDA. is it true?
In video_dec_cuda sample it does not use NVBUF_TRANSFORM_API path in converting decoder output from block linear to pitch linear.

So my question is
If I want to get hardware/CUDA support in conversion which path I should use? NVBUF_TRANSFORM_API path or the other?

Hi,

USE_NVBUF_TRANSFORM_API is to use NvBuffer APis defined in nvbuf_utils.h instead of NvVideoConverter class. Both are different frameworks but uses same HW component VIC. Both can do CUDA processing via the sample function calls:

NvEGLImageFromFd()
HandleEGLImage()
NvDestroyEGLImage()

You can call dec->abort() first, and then stop your dqBuffer thread.

It works fine for me!