NVMM out of order buffers

Hello,

We’re developing a GStreamer-based application for a Jetson Nano and a Jetson TX2 and we find a strange behavior when using NVMM memory along with appsrc element. Depending on some conditions I will detail later, the buffers comes from nvvidconv out of order.

The pipeline we’re using is essentially like this:

videotestsrc ! nvvidconv ! video/x-raw(memory:NVMM) ! fakesink ===> appsrc !  nvvidconv ! video/x-raw ! ...

We capture the fames which arrive to fakesink using a pad probe, and push then into the appsrc element. We noticed the fames came out of the second nvvidconv (the one linked to appsrc) out of order.

Although, we could only reproduce this issue under certain conditions:

  • It’s needed to use nvvidconv with memory:NVMM. With CPU memory, the issue doesn’t appear.
  • The property output-buffers from nvvidconv must be small (default 4). If it’s high (eg 50) the issue doesn’t appear.
  • The consumer pipeline (the one with appsrc) must be set to PLAYING a bit later than the producer one (the one with videotestsrc). The number of out of order buffers found seems to be related with this time.

Please, find attached the main.c and the Makefile we prepared in order to reproduce this issue easily. The original application is using nvarguscamerasrc, and it has the same problem. We have just replicated the same issue with videotestsrc and nvvidconv, but nvvidconv is not needed to replicate it when using nvarguscamerasrc pushing directly into NVMM memory. Usage:

  • Launching make will launch 4 different tests and put the output frames under folders out/<name_of_the_test>
  • With make test_with* one can launch a single test

On the folders out/<name_of_the_test> one can find 50 frames in png. Each image contains two timestamps on the top-right and top-left corners. The top-left one is the timestamp from the producer pipeline, the top-right is from the consumer pipeline. A quick inspection is enough to see that the frames came out of order just in the folder out/test_with_nvvidconv.

Please, notice we’re using gst_buffer_copy() just before pushing it into appsrc.
I also find this issue that could be related: frames returned from nveglstreamsrc via EGL stream out of order

buffer = gst_buffer_copy(info->data);
gst_app_src_push_buffer(consumer_src, buffer);

Thank you,
Carlos.

2 Likes

Here is the testing program:
nvmm_test.tar.gz (2.3 KB)

I can not attach it in the main post because I’m a new user.

1 Like

Hi,
Please check if it happens by runing like:

videotestsrc is-live=1 ! nvvidconv ! fakesink sync=0

Would like to eliminate identity and timeoverlay plugin to check if the issue is still present. And please check timestsamps of gstbuffer and see if it is correct. The nvvidconv plugin is open source and it simple calls NvBuffer APIs for required conversion/scaling. It is more like the buffers are not in order due to other plugins.

You may download the source code and try to add prints for further checking. The source code is in
https://developer.nvidia.com/embedded/linux-tegra-r3273

Driver Package (BSP) Sources

Hello DaneLL,

Thank you very much for your fast response.

The pipeline you purposed doesn’t reproduce the issue. We could only reproduce it using memory:NVMM and appsrc.

The issue can also be reproduced without identity and timeoverlay elements.

The PTS of the buffers are correct even when they are out-of-order. With the command GST_DEBUG=5 GST_DEBUG_NO_COLOR=1 GST_DEBUG_FILE=gst.log make test_with_nvvidconv and the following gnuplot script, one can recover the attached plot:

$ GST_DEBUG=5 GST_DEBUG_NO_COLOR=1 GST_DEBUG_FILE=gst.log make test_with_nvvidconv
...
$ cat log_plotter.sh
#!/bin/bash -e

pts_list_from_identity() {
	grep "basetransform.*${2}.*PTS" "${1}" \
	| sed -e 's/^\([0-9:.]*\).*PTS \([0-9:.]*\).*$/\1 \2/'
}

plot_buffers_pts() {
	pts_list_from_identity "${1}" "prod_identity" > prod_identity.data
	pts_list_from_identity "${1}" "cons_identity" > cons_identity.data
	gnuplot -p <<-EOF
		set xtics rotate by -45
		set xdata time
		set ydata time
		set format y '%M:%.9S'
		set format x '%M:%.2S'
		set timefmt '%H:%M:%S'
		set autoscale
		plot \
			'prod_identity.data' using 1:2 with lines, \
			'cons_identity.data' using 1:2 with lines
	EOF
}

plot_buffers_pts "${1}"
$ ./log_plotter.sh gst.log

image

Hello,

We strongly think the issue is because GstNvFilterMemoryAllocator doesn’t have implemented the method GstMemoryCopyFunction, but just the GstMemoryMapFunction and GstMemoryUnmapFunction ones. Therefore fallback copy method, gstallocator.c:_fallback_mem_copy(), is being used. This method does just memcpy() the internal structure NvBufferParams->nv_buffer, but not the full buffer.

We think this could explain why we can easily reproduce the issue when the appsrc has several buffers in queue (more than 4 by default) because this increase the chance for nvvidconv to reuse one of these buffers that are still in use. The steps would be something like:

  1. nvvidconv allocates a new buffer (with a new NvBufferParams)
  2. We pick this buffer in the probe, call gst_buffer_copy() (which copies the entire NvBufferParam). This doesn’t increase the refcount of the original buffer, but creates a new one.
  3. appsrc stores the copy in its queue
  4. fakesink unref the buffer, so nvviconv could now use this surface again.
  5. Repeat the previous steps 3 more times
  6. nvvidconv already uses its 4 surfaces (output-buffers=4), so it’s going to reuse one of the previous one. The problem is that these surfaces are not been processed yet.

It also would explain why it’s harder to be reproduced when bigger output-buffers are used, or when the consumer pipeline start at the same time than the producer one, as the queue of appsrc will never fill.

Please, let us know what are your thoughts on this matter.

Regards,
Carlos

Hi,
By default mem_copy() callback is not implemented since we don’t see it be called in our test cases. It is possible the callback is required and called in certain use-cases. Could you try to implement the callback and see if it works for this use-case?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.