NvBuffer dmabuf-fd zero copy to GstBuffer question

The jetson-multimedia-api example of 12_camera_v4l2_cuda,i want to put the dmabuf-fd to the appsrc of gstreamer pipeline of H265 UDP Streaming.
The topic method use “g_malloc”,“gst_buffer_new_wrapped_full” and ”memcpy“,
Is there no memory copy?This method is zero copy?

You are right. There is no copying data from CPU buffer to NVMM buffer in the sample. The NVMM buffer data is put into gstbuffer directly.

my test camera 1280x960 uyvy format
the nv_buffer_size is 1008,and H265 UDP Streaming Latency test is 9 frame(60 fps video on 120Hz hdmi screen),so the latency is 9*16.7=150ms
but the same code ,i change to my old method,the H265 UDP Streaming Latency test is also 9 ~10 frames,
Why doesn’t DMA method have a clear advantage?why dma method also have to use memcpy to copy 1008 byte ? Is this a true zero copy?
old method
NvBufferMemSyncForCpu(fd,0,(void **)&(pThis->fd_va[index]));
buffer = gst_buffer_new_allocate(NULL,pThis->n_size, NULL);
n_size is rgba of 1280x960 — 1280x960x4=4915200

The sample is for demonstration and not optimal. It creates and destroys NvBuffer in each feed_function() and notify_to_destoy(). It is better to create multiple NvBuffer and re-use the buffers. So that you only need to do this in feed_function():

    data = g_malloc(par.nv_buffer_size);

    buffer = gst_buffer_new_wrapped_full(flags,
    buffer->pts = timestamp;

    gst_buffer_map (buffer, &map, GST_MAP_WRITE);
    memcpy(map.data, par.nv_buffer , par.nv_buffer_size);
    gst_buffer_unmap(buffer, &map);

This only needs to copy nv_buffer_size instead of full frame data(width * height * 1.5 bytes for YUV420)

yes,i create 3 NvBuffer and re-use the buffers,the feed_function in the same way you recommend.
The real h265 udp stream latency test,There’s not much difference in this comparison test.
Can we use this method of gst_dmabuf_allocator_alloc?don’t need to make any copies

This may not be the cause of latency. It is very likely from the source or encoder setting. Please try the steps in this post and check if you can get identical result:
Gstreamer TCPserversink 2-3 seconds latency - #5 by DaneLLL

And set insert-sps-pps=1 idrinterval=15 in your use-case and try.

yes,i use gstreamer pipeline to check,nvv4l2camerasrc and v4l2src pipeline have the same latency of the h264 udp stream [60 fps video on 120Hz hdmi screen, 6~9 frame latency ]
tx2 rtp send pipeline:

win 10 gstreamer recv pipeline:

nvv4l2camerasrc 240fps camera record

v4l2src 240fps camera record
Uploading: v4l2src_HSR_240.mp4…

Please try videotestsrc plugin and check the latency. We see neglectable latency in our set up. If you see a larger value, it should be from the network.

If network is good and stable, it should be similar to this:
Gstreamer TCPserversink 2-3 seconds latency - #5 by DaneLLL

Also the camera source can have certain latency in capturing frames.

videotestsrc plugin test,the latency is stable, 33ms
This means that the encoding delay, transmission delay, and decoding delay together are 1 frame;

33 ms look reasonable. Please make sure you run sudo nvpmodel -m 0 and sudo jetson_clocks. And set max performance to encoder. This is the optimal mode of Jetson Nano.

yes,i have set max performance!
network latency is very small,mainly the delay of the camera acquiring data;
Whether there is room for optimization of camera data latency on the jetson-multimedia-api example of 12_camera_v4l2_cuda?

Please share more information about the camera(vendor and model ID). Do you use the product from our camera partner?