Deadlock in gst_element_release_request_pad with nvstreammux

notmuchtotell · October 24, 2019, 6:06pm

I’ve made some progress with a workaround for dynamically adding and removing streams. My particular use case involves a nvstreammux → nvinfer → nvstreamdemux middle part of the pipeline. Here are some findings:

All of the request pads need to be set up at the beginning.
The request pads must never be released as part of adding and removing streams. Rather they are linked and unlinked.
When a stream is finished as detected by using gst_nvmessage_is_stream_eos and gst_nvmessage_parse_stream_eos in the message bus when an element is found, start the tear down.
Tearing down involves pausing everything that is running, setting the state of the source and sink that we want to remove to the Null state and then unlinking and removing from the pipeline.
For adding streams it seems to work better to pause anything that is playing in the pipeline first.
Be sure to use a lock to control the manipulation of the pipeline so that only one thread is modifying it at a time. I lock it for all of the add code, and for all of the remove code.

I’m stilling having some issues though. With two streams being added/removed asynchronously, it works for like 12-15 streams added/removed and then errors out with:

0:00:21.177111588  2961 0x55b338a39e30 WARN                 nvinfer gstnvinfer.cpp:1830:gst_nvinfer_output_loop:<pgie> error: Internal data stream error.
0:00:21.177621422  2961 0x55b338a39e30 WARN                 nvinfer gstnvinfer.cpp:1830:gst_nvinfer_output_loop:<pgie> error: streaming stopped, reason error (-5)

I haven’t found anything in the logs that gives more insight into this error. Can someone from Nvidia let us know what that error means? I have also seen occasionally NPP_NOT_SUFFICIENT_COMPUTE_CAPABILITY. The GPU isn’t running out of memory.

However, my code works fine if I limit it to only one stream at a time. For now that looks like what I’ll have to do. At least with this way I don’t have to reload the model. I will try queuing up other sources into the playing state, but unlinked. Then I will link up the new source ASAP after the previous source is torn down. I’m don’t yet know how much this will slow down throughput, but it seems my best option for now.

One other thought that has occurred to me is that there might be a problem with having a request pad that isn’t linked when the middle elements are in the playing state. I’ll investigate this if I find time to do so.