Allocating a GstBuffer of type "memory:NVMM"

ShacharShemesh · June 15, 2020, 7:48am

Hello everyone,

I’m trying to encode video I already processed using CUDA with the nvv4l2h265enc encoder using GStreamer. To do so, I need to pass it a GstBuffer that matches the filter it wants (It says it needs memory:NVMM). I am slowing picking out what that actually means.

By reading the dsexample source code, I’ve already found that I need to give it a GstBuffer that its mem info is a pointer to a NvBufSurface. What I’m still trying to figure out:

The surface does not have any CUdeviceptrs. I think that NvBufSurface’s dataPtr is it, but I would love to get it verified.
I’m fairly sure there is supposed to be standard allocator for this buffer type. Some GstAllocator that I can pass to gst_buffer_new_allocate to get the GstBuffer already correctly allocated. Since dsexample does not allocate new buffers, I don’t know what that allocator is.

Any help would be greatly appreciated.
Thank you,
Shachar

ShacharShemesh · June 16, 2020, 5:50am

I’ve created a pipeline that says:

filesrc ! qtdemux ! h264parse ! nvv4l2decoder ! nvv4l2h265enc ! h265parse ! qtmux ! filesink

So it transcodes the video component of a file from h264 to h265. The resulting file is a valid mp4 with h265 encoded video, so I know that the pipeline works.

I installed a probe on the sink pad of the nvv4l2h265enc element, and when I examine the incoming buffers, I see:

(gdb) p *surface
$1 = {gpuId = 1008, batchSize = 0, numFilled = 0, isContiguous = false, memType = NVBUF_MEM_CUDA_PINNED, surfaceList = 0x0, _reserved = {0x0, 0x0, 0x0, 0x0}}

So, the NvBufSurface part seems to be empty. And yet, the buffers get encoded. I am lost for how to proceed from here.

eenav · June 17, 2020, 6:01am

You can use NvBufSurfaceCreate to create NvBufSurface: NVIDIA DeepStream SDK API Reference: Buffer Surface Management API

ShacharShemesh · June 17, 2020, 6:22am

The problem isn’t creating a NvBufSurface. The problems are:

Extracting a CUdeviceptr from it, so I can run my filters.
Passing it inside a GstBuffer in a way that is compatible with what nvv4l2h265enc expects, so that release doesn’t corrupt memory.

DaneLLL · June 17, 2020, 8:34am

Hi,
Please refer to the sample( appsrc_nvmm.zip ):

ShacharShemesh · June 17, 2020, 2:32pm

Thank you for that sample. I am still processing it, but it’s definitely a step in the right direction. Thank you.

I have one quibble. The program this is supposed to integrate with runs on a headless server. For using CUdeviceptr, this is not a problem. This program, however, calls eglGetDisplay, which obviously fails if no display is present (e.g., if DISPLAY is not set).

Assuming all I want is to encode, is opening a display really necessary? Is there a way to process the images without a display available?

Thank you again,
Shachar

eenav · June 22, 2020, 9:12am

Hi @DaneLLL could you please respond ?
Thanks

ShacharShemesh · June 22, 2020, 9:15am

I also ran into another, more major, concern. gst-inspect claims that changing the bitrate is only possible when the element is in READY or NULL state. Does that mean that every time I change the bitrate (CBR mode) the encoder will issue a new IDR? Because if so, this product is unusable to us :-(

Thank you,
Shachar

eenav · June 22, 2020, 9:18am

I know there is qp per frame to control bitrate
but perhaps @DaneLLL can add to this

ShacharShemesh · June 22, 2020, 9:20am

I do not want to try to guess the correct QP. I know how much bandwidth I have, and want the encoder to get the maximal quality out of it.
I’m sorry for repeating this, but it’s something that NvEnc knows how to do.

I might be able to hack something out of VBR and setting the maximal bandwidth below the ~~minimal~~average bandwidth (would that assert?), as that property can be set without stopping the pipeline.

DaneLLL · June 22, 2020, 9:26am

Hi,
Please refer to

So if you have the drivers installed, the call should succeed even though the device is headless.

eglGetDisplay(EGL_DEFAULT_DISPLAY);

DaneLLL · June 22, 2020, 9:40am

Hi,
For changing bitrate dynamically, please refer to

We are deprecating omx plugins, so please use v4l2 plugins such as nvv4l2h264enc.

ShacharShemesh · June 22, 2020, 10:43am

Hi DaneLLL,

Thank you, but this is not quite what I was referring to. An encoder has two modes of operations. Constant Bit Rate, or CBR, which means that all frames receive exactly the same bitrate, and Variable Bit Rate, or VBR, in which the bitrate you set is the average bit rate for the encoding. Both NvV4l2H265Enc and NvEnc support both modes (for the former this is done using the control-rate property).

With VBR, the encoder also has a property called peak-bitrate, that sets the maximal amount of bit rate a single frame may “steal” from the running average. For CBR this property is, of course, meaningless.

For what we want to do, VBR is not an option. We need all frames to be encoded using the same bit rate. For that reason, the link you sent me, employing VBR, is irrelevant. Our constant bit rate, however, changes over time. Under NvEnc, this merely meant calling nvEncReconfigureEncoder.

For NvV4l2H265Enc, changing the bitrate requires changing the bitrate property. This is what gst-inspect-1.0 has to say about it:

  bitrate             : Set bitrate for v4l2 encode
                        flags: readable, writable, changeable only in NULL or READY state
                        Unsigned Integer. Range: 0 - 4294967295 Default: 4000000

So, in order to set a new constant bit rate, I need to put the pipeline into READY, set a new bit rate, and then set it to PLAYING again.

What I’m worried about is that this stopping and restarting will result in the encoder restarting, causing it to send a new IDR. IDRs significantly reduce the video’s quality, and I would like to avoid them.

I hope my question is clearer.

Thank you,
Shachar

DaneLLL · June 22, 2020, 11:24pm

Hi,
It may not be shown correctly in gst-launch-1.0. You can change bitrate runtime. Please check the attachment and give it a try

$ g++ -Wall -std=c++11  test2.cpp -o test $(pkg-config --cflags --libs gstreamer-app-1.0)

nvidia@nvidia-desktop:~$ ./test
Using launch string: nvarguscamerasrc ! video/x-raw(memory:NVMM), width=1920, height=1080, framerate=30/1,format=(string)NV12 ! nvv4l2h264enc control-rate=1 name=video_enc ! video/x-h264,stream-format=byte-stream ! appsink name=mysink
Default bit rate is 2Mbps
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 2592 x 1944 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 16.000000; Exposure Range min 34000, max 550385000;

GST_ARGUS: 2592 x 1458 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 16.000000; Exposure Range min 34000, max 550385000;

GST_ARGUS: 1280 x 720 FR = 120.000005 fps Duration = 8333333 ; Analog Gain range min 1.000000, max 16.000000; Exposure Range min 22000, max 358733000;

GST_ARGUS: Running with following settings:
   Camera index = 0
   Camera mode  = 1
   Output Stream W = 2592 H = 1458
   seconds to Run    = 0
   Frame Rate = 29.999999
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
H264: Profile = 66, Level = 0
bitrate = 1964016 bps
bitrate = 1402816 bps
bitrate = 2135232 bps
bitrate = 2034696 bps
Change bit rate to 4Mbps
bitrate = 3695272 bps
bitrate = 3424664 bps
bitrate = 3963344 bps
bitrate = 4294328 bps
bitrate = 3840864 bps
bitrate = 3819248 bps
bitrate = 4085104 bps
bitrate = 4266448 bps
bitrate = 3933472 bps
bitrate = 4129880 bps
app sink receive eos
GST_ARGUS: Cleaning up
CONSUMER: Done Success
GST_ARGUS: Done Success
going to exit

test2.zip (1.7 KB)

ShacharShemesh · June 23, 2020, 9:08am

Hi,

Thank you. The bitrate command did, indeed, work while playing. As you are no doubt surprised to hear, this uncovered a new batch of questions. Sorry about that.

I can’t seem to get force-IDR to work

I’ve tried the following code:

        GstFlowReturn ret;
        g_signal_emit_by_name (_encoder, "force-IDR", &ret, nullptr);

This results in a segmentation fault inside the encoder. I’ve also tried without the nullptr and the result pointers with no difference in effect.

I would like to know info about what was encoded

NvEnc reports back to us the averageQP and the frame type encoded (I, P or B frames). I did not find anywhere the NvV4l2 does the same (did I miss something?).
gst-inspect-1.0 says that NvV4l2 expects the buffer in one of several formats. Our format of choice is I420, mostly because that’s the one we use for NvEnc, and our filters already assume it. I saw no way of allocating such a frame using NvBufferCreate. The closest I found was NvBufferColorFormat_YUV420, which has the same color format, but not the same memory layout. What would be the best way to allocate an I420 buffer?
Last (and much less important), if I understand the GStreamer docs correctly, I made sure that the buffer I pass in should have a frame index in the offset field, and I should get that same index on the output. This doesn’t happen. On output, offset is always 0 and offset_end is always 1.

Encoded sizes

This is of somewhat secondary importance, but still worth noting, in case there is a way to fix this. Especially in low bandwidth encoding, the encoder seems to grossly over-use the bandwidth allocation when sending an IDR.

Thank you very much for you help,
Shachar

DaneLLL · June 24, 2020, 4:36am

Hi,
For forcing IDR frames, please try

g_signal_emit_by_name (encoder, "force-IDR", NULL, &ret);

We have it in jetson_multimedia_api. You can run

01_video_encode$ ./video_encode /home/nvidia/a.yuv 320 240 H264 /home/nvidia/a.264 --report-metadata

It is not in gst-v4l2. The gst-v4l2 package is open source, would need to port it from jetson_multiemdia_api to gst-v4l2 manually.

Yes, I420 is NvBufferColorFormat_YUV420.

For matching video buffers between input and output of encoder, we usually check pts in GstBuffer.
https://gstreamer.freedesktop.org/documentation/gstreamer/gstbuffer.html?gi-language=c#GstBuffer
Please check if this is good.

ShacharShemesh · June 24, 2020, 5:23am

Thank you. I want to verify that I understood what you’re suggesting correctly.

Are you saying that the NVidia V4L2 encoder uses the gst-v4l2 plugin as a library, so that if I install a new version of the later, I can add functionality to the former? I would also like to verify that, when you talk about an open source plugin, you are referring to the sys/v4l2 directory under the “gst-plugins-good” repository. There is a little legal challenge involved with that route, as that plugin is (an ancient version of) LGPL, but not one we cannot surpass.

It was my understanding that I420 also says that the buffers are consecutive in memory (i.e. - accessed with one pointer). As far as I can tell, the buffers I get come from three different allocations. Again, I can work with that. I just want to confirm that I’m not missing something, and that this is an acceptable input to the encoder.

Thank you for all your help,
Shachar

ShacharShemesh · June 24, 2020, 11:39am

Same thing. Segfault. I’ve also tried more parameters and populating them with valid addresses. It still segfaults.

ShacharShemesh · June 25, 2020, 3:38pm

Hi @DaneLLL,

Aside from all the other problems, it seems the basic premise of the question did not receive a good answer yet :-(

It seems that buffers I pass to the encoder via GStreamer are not processed correctly. They seem to have some strange mis-interpretation to them, with the lines going all slanty (I’m sorry about the non-technical term).

I’ve recreated the issue for a single frame by using the following gstreamer pipeline, and then analyzing the file as raw yuv:

appsrc name=injector num-buffers=1 ! video/x-raw(memory:NVMM),width=1920,height=1080,framerate=25/1,format=I420 ! nvvidconv ! video/x-raw ! filesink location=/tmp/frame.yuv

I’ve also dumped the buffer to a file using the following commands:

	size_t rawSize = inputPicture.size.width*inputPicture.size.height;
	void *y, *u, *v;
	NvBufferMemMap( outputBuffer.getFd(), 0, NvBufferMem_Read, &y );
	NvBufferMemMap( outputBuffer.getFd(), 1, NvBufferMem_Read, &u );
	NvBufferMemMap( outputBuffer.getFd(), 2, NvBufferMem_Read, &v );
	NvBufferMemSyncForCpu( outputBuffer.getFd(), 0, &y );
	NvBufferMemSyncForCpu( outputBuffer.getFd(), 1, &u );
	NvBufferMemSyncForCpu( outputBuffer.getFd(), 2, &v );
	dumpFd = open("/tmp/input.yuv", O_CREAT|O_TRUNC|O_WRONLY, 0666);
	write(dumpFd, y, rawSize);
	write(dumpFd, u, rawSize/4);
	write(dumpFd, v, rawSize/4);
	NvBufferMemUnMap( outputBuffer.getFd(), 0, &y );
	NvBufferMemUnMap( outputBuffer.getFd(), 1, &u );
	NvBufferMemUnMap( outputBuffer.getFd(), 2, &v );
	close(dumpFd);

What I see is that displaying input.yuv shows a valid 4:2:0 Planar picture at 1920x1080. frame.yuv, on the other hand, kind of makes sense if you treat it as 1800x1152, except it contains lines that have a smaller stride or 1792, with every 16 lines a line is skipped.

I am attaching both files as PNG (I cannot upload yuv due to permissions on the forum). For frame.yuv it is produced with stride of 1800. You can see that if you zoom out, it is obvious that it is the same picture (including correct colors).

As I said above, I’m showing this using a static picture, but the generated HEVC suffers from a substantially similar problem.

Help?

Thank you,
Shachar

DaneLLL · June 28, 2020, 10:57pm

Hi,
Please check dump_dmabuf() in

/usr/src/jetson_multimedia_api/samples/common/classes/NvUtils.cpp

Please read the pitch of each plane and copy the data line by line.