Nvvidconv stuck after, "gst_nv_filter_buffer_pool_release_buffer:<nvfilterbufferpool1> release_buffer"

Please help with a stuck issue encountered when using nvvidconv with egl input.

The test pipeline,

gst_parse_launch(“nveglstreamsrc name=eglSrc ! video/x-raw(memory:NVMM), format=RGBA, width=960, height=540, framerate=15/1 ! nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! fakesink”, &errMsg)

After launching, nveglstreamsrc is provided with the “display” and “eglstream” handlers. The pipeline is set to PLAYING state.

The user can toggle the app to destruct the pipeline and re-construct it. The destruction of the pipeline follows the proper procedure of send down an EOS, wait on the bus for the EOS, put the pipeline to NULL, and unref it.

Found that: After the pipeline is destruction the first time and the second construction of the pipeline, nvvidconv is stuck as shown in the following log messages,

0:00:42.197325824 7910 0x7e400040a0 DEBUG nveglstream gstnveglstreamsrc.cpp:833:NvEglStreamSrcUpdateFrame: IN: NvEglStreamSrcUpdateFrame

0:00:42.197370368 7910 0x7e400040a0 DEBUG nveglstream gstnveglstreamsrc.cpp:835:NvEglStreamSrcUpdateFrame: acquireFrame=0x7e650075e0 releaseFrame=(nil)

0:00:42.197389728 7910 0x7e400040a0 DEBUG nveglstream gstnveglstreamsrc.cpp:846:NvEglStreamSrcUpdateFrame: OUT: NvEglStreamSrcUpdateFrame

0:00:42.197744352 7910 0x7e40004140 DEBUG nvvidconv gstnvvconv.c:695:gst_nv_filter_buffer_pool_acquire_buffer: acquire_buffer
0:00:42.198051808 7910 0x7e40004140 DEBUG nveglstream gstnveglstreamsrc.cpp:833:NvEglStreamSrcUpdateFrame: IN: NvEglStreamSrcUpdateFrame

0:00:42.198077184 7910 0x7e40004140 DEBUG nveglstream gstnveglstreamsrc.cpp:835:NvEglStreamSrcUpdateFrame: acquireFrame=(nil) releaseFrame=0x7e66262148

0:00:42.198090048 7910 0x7e40004140 DEBUG nveglstream gstnveglstreamsrc.cpp:846:NvEglStreamSrcUpdateFrame: OUT: NvEglStreamSrcUpdateFrame

0:00:42.198117632 7910 0x7e40004140 DEBUG nvvidconv gstnvvconv.c:716:gst_nv_filter_buffer_pool_release_buffer: release_buffer
0:00:42.251773888 7910 0x7e400040a0 DEBUG nveglstream gstnveglstreamsrc.cpp:833:NvEglStreamSrcUpdateFrame: IN: NvEglStreamSrcUpdateFrame

0:00:42.251815456 7910 0x7e400040a0 DEBUG nveglstream gstnveglstreamsrc.cpp:835:NvEglStreamSrcUpdateFrame: acquireFrame=0x7e650075e0 releaseFrame=(nil)

release_buffer is the last log from nvvidconv and nveglstream went on to acquire several more frame. Pipeline hung from this point.

What does release_buffer in nvvidconv mean? And what signal/event it is waiting on from this point?

Please advise. Thanks.

Note: The egl handlers are not from the Nvidia libegl.

For the issue above, please consider this urgent because it causes us to be unable to stress test our video pipelines and makes it impossible to connect and reconnect.

Hi,
We have the sample for demonstrating nveglstreamsrc:

/usr/src/jetson_multimedia_api/argus/samples/gstVideoEncode/

It is specific to using nvarguscamerasrc. Your usecase looks different from this one. Would need your help to provide a patch to this sample or a test code so that we can replicate the issue. And do you use JP4.4.1(r32.4.4)?

Yes, we are on JetPack 4.4.1

Will look into the code and do a patch. Thanks for your swift response.

A bit more detail on the issue… The real pipeline is a lot more complicated than the one posted here. We hit the nvvidconv hanging condition with that pipeline. Decided to stripped down the pipeline to locate the issue. The pipeline came down to just two Elements connected to a fakesink.

nveglstreamsrc → nvvidconv → fakesink.

When this simplified pipeline being taken down (set to NULL and unref), the egl handlers should be released and the provider (our code) should be free to destroy them. Is there any timing/signal conditions needed to be aware of?

Hi,
The nvvidconv plugin is widely used in many usecases, so the issue might be lie in nveglstreamsrc. We would need to replicate it and do investigation.

Understood and agree.

In our application, the pipeline is not configured as follow,

nvarguscamerasrc → nveglstreamsrc → nvvidconv → omxh265enc ->…

In between the camerasrc and eglstreamsrc, there are some video processing. Would be help if we could understand what nvvidconv’s log of release_buffer means. And if there are other GST_DEBUG symbols can be used to dump more logs. Currently, GST_DEBUG is set with, “nvvidconv:6”

Thanks.

As mentioned above, our pipeline is as follow,

ourEglProvider → nveglstreamsrc → nvvidconv → omxh265enc → …

On the first launch, which is successful, the following are the logs from nvvidconv (after omx is ready),

0:04:33.986265568 7787 0x7e3c001b20 DEBUG nvvidconv gstnvvconv.c:716:gst_nv_filter_buffer_pool_release_buffer: release_buffer
0:04:33.986296224 7787 0x7e3c001b20 DEBUG nvvidconv gstnvvconv.c:520:gst_nv_filter_buffer_pool_stop: stop
0:04:33.986310592 7787 0x7e3c001b20 DEBUG nvvidconv gstnvvconv.c:670:gst_nv_filter_buffer_pool_free_buffer: free_buffer
0:04:33.986456768 7787 0x7e3c001b20 DEBUG nvvidconv gstnvvconv.c:670:gst_nv_filter_buffer_pool_free_buffer: free_buffer
0:04:33.986486432 7787 0x7e3c001b20 DEBUG nvvidconv gstnvvconv.c:670:gst_nv_filter_buffer_pool_free_buffer: free_buffer
0:04:33.986628544 7787 0x7e3c001b20 DEBUG nvvidconv gstnvvconv.c:670:gst_nv_filter_buffer_pool_free_buffer: free_buffer

[debug msg from my code] >>> Media in GST_STATE_PLAYING

Pipeline was brought down. Monitored the state changes, etc.

On the next launch of the same pipeline, nvvidconv crash with the following logs,

0:13:20.138533088 7787 0x7e7000bd90 DEBUG nvvidconv gstnvvconv.c:716:gst_nv_filter_buffer_pool_release_buffer: release_buffer
0:13:20.202395520 7787 0x7e18002800 DEBUG nvvidconv gstnvvconv.c:520:gst_nv_filter_buffer_pool_stop: stop
0:13:20.202444864 7787 0x7e18002800 DEBUG nvvidconv gstnvvconv.c:670:gst_nv_filter_buffer_pool_free_buffer: free_buffer
0:13:20.202509760 7787 0x7e18002800 DEBUG nvvidconv gstnvvconv.c:670:gst_nv_filter_buffer_pool_free_buffer: free_buffer
0:13:20.202535616 7787 0x7e18002800 DEBUG nvvidconv gstnvvconv.c:670:gst_nv_filter_buffer_pool_free_buffer: free_buffer
0:13:20.202866304 7787 0x7e18002800 DEBUG nvvidconv gstnvvconv.c:670:gst_nv_filter_buffer_pool_free_buffer: free_buffer
free(): invalid pointer
[/roslaunch_logger] Coredump generated …

Would like to know what are these buffers that nvvidconv are free-ing when the pipeline is launched. Would these buffers have anything to do with the input egl buffers?

Thanks.

Hi,
You may try nvv4l2h265enc. Since we have deprecated omx plugins. The v4l2 plugins can be more stable. Be default it allocated 4 buffers in source pad of nvvidconv plugin. Seems like it hits segment fault in releasing the last buffer.

Thank you for your info.

Understood that a complicated pipeline is not desirable to locate the issue. Thus, as mentioned at the beginning of this ticket, I used a very simple pipeline to bring out the issue and it is as follow,

ourEglProvider → nveglstreamsrc → nvvidconv → fakesink

Without nvvidconv, the above pipeline can be launched and brought down without any issue.

Adding nvvidconv, the 2 launch of the pipeline would see nvvidconv run into a seg fault due to release an invalid pointer (as indicated in the log messages).

Please note that we need nvvidconv in the pipeline just for one reason - our egl provider out is RGBA and the Nvidia encoder plugins (omx and v4l2) need NV12. Thus, nvvidconv is used for this reason alone.

Our findings:

  1. When our egl provider detects the eglStream state is in EGL_STREAM_STATE_DISCONNECTED_KHR, it destroys the handler. This action will cause the nvvidconv to crash on the very next pipeline launch.
  2. In the same situation above, if our egl provider does not destroy the handler (allow memory leakage), nvvidconv will not crash regardless how many times the pipeline is launched/brought down.

Would be helpful if you can provide,

  1. The proper procedure to prompt nvvidconv to release its buffers.
  2. What resources nvvidconv could still be hanging on upon a pipeline brought down.

Thanks.

Hi, Dane:

Base on the sample code, github.com/DaneLLL/gstreamer_eglstreamsrc, made some modifications and duplicated the crash problem that occurred in our pipeline.

The segment fault occurred on the second loop with the following logs,

Render loop success!
GST_STATE_NULL begin
0:00:01.916794112 24287 0x5583f78560 DEBUG nvvidconv gstnvvconv.c:716:gst_nv_filter_buffer_pool_release_buffer: release_buffer
0:00:01.918117792 24287 0x5583f78560 DEBUG nvvidconv gstnvvconv.c:520:gst_nv_filter_buffer_pool_stop: stop
0:00:01.918159520 24287 0x5583f78560 DEBUG nvvidconv gstnvvconv.c:670:gst_nv_filter_buffer_pool_free_buffer: free_buffer
free(): invalid pointer
Aborted (core dumped)

Upload the test code here for your testing. Thanks.

daneStream.cpp (7.2 KB)

Thanks for looking into this more directly Sid. Hopefully this will enable Nvidia to figure out what looks like a problem in nvvidconv.

Hi,
Thanks for sharing the test sample. We have one more question in setting up the test. Is your eglstream producer always-on in the test? Can see the gstreamer pipeline is in loop in the test sample. Not sure if you initialize /destroy the producer in loop too.

The following are the steps in the sample test,

  1. Launch the pipeline.
  2. Create the egl handlers, eglStream and eglDisplay.
  3. Configure nveglstreamsrc with the above handlers and set the pipeline to PLAYING.
  4. Create the producer (basically, create eglContext and eglSurface for rendering in the next step).
  5. Render 20 video frames.
  6. Take down the pipeline (set it to NULL), destroy pipeline.
  7. Destroy eglDisplay, eglStream, eglContext, and eglSurface (the producer is destroyed).
  8. Loop back to step 1.

The short answer is: Yes, the simple producer is initialized and destroyed in the loop.

Note: Our findings seem to point one fact, nvvidconv will crash if the egl handlers are destroyed in the previous launch of the pipeline.

Thanks.

An additional piece of info.

In our use case (very different from the test sample), the producer is not destroyed and is always online. Ready to provide egl handlers and video frames via the egl interface. The producer is running in its own thread.

The encoding pipeline, running in its own thread, is launched by the RTSP server (created by RTSP factory) as a RTSP media. nveglstreamsrc is configured in the “media-configure” callback during the establishment of a RTSP session with a client. The callback prompts the provider for egl handlers (eglStream and eglDisplay) to complete the configuration and the entire pipeline can go to the PLAYING state. The provider only input video frame to the pipeline when it is in PLAYING.

When a RTSP client disconnects, the pipeline is taken down. nveglstreamsrc releases the egl handlers by setting their states to Disconnected. The producer will not destroy these disconnected handler until it gets a signal from the media destroy Callback. This is done to ensure the pipeline is taken down before the egl handlers are destroyed.

The producer stays online waiting on the next request for egl handlers and frame input.

Hi,
Please apply the attachment and check if you can run without hitting the issue.

r32_4x_TEST_libgstnveglstreamsrc.zip (11.3 KB)

Great, we will try this out. What was fixed? What was the root cause?

Thank you. Will try it out.

Would greatly appreciate any findings/info on the issue.

Hi,
It looks to be an issue in nveglstreamsrc. We verify one-time launch in reference samples. The usecase of switching in NULL <-> PLAY state is not verified and does not work properly.

Understood. I suppose the fix is surrounding the allocation and destruction of nvvidconv buffers. Is it the input (sink) or output (source) buffer? Is the change very localized? Please advise.

Hi, Dane:

Would like to inform you that the nvvidconv plugin you provided works with our pipeline. We very much like to deploy with this fix. To proceed down that path, it is our requirement to have a clear understanding of the issue itself, the way it is addressed, and the proper release process and schedule.

Please kindly provide the above info, we have a release that needs urgent attention. Thank you.