Dynamically loading/unloading multiple models with nvinferserver plugin in gstreamer application

Hi,

We are developing a C++ Deepstream application using Triton Inference Server, where we would like to be able to only have one model loaded at the time, but still be able to change to a different one at runtime with a service call. The issue we are seeing is, that if you ever take the nvinferserver plugin from the PAUSED/PLAYING state to the NULL/READY state for reconfiguration, then the TRTIS backend will get stuck in a bad state that you can’t recover from. The problem seems to be that it will try to initialize the Triton Inference Server every time you bring a nvinferserver plugin to PAUSED or PLAYING, and it just can’t handle that.

Reading https://github.com/triton-inference-server/server/blob/master/docs/model_management.md it seems like it should be possible to reconfigure to a different model runtime, but maybe there are some constraints to the interface through the gstreamer plugin that I am overlooking?

I guess my question is: Is this intended behaviour and am I just misusing the SDK? If so, is there a better way to solve the problem? I tried to condense our approach into an example shown below.

• Hardware Platform (Jetson / GPU)
Jetson TX2
• DeepStream Version
5.0.0
• JetPack Version (valid for Jetson only)
4.4.0
• TensorRT Version
7.1.3.0
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
Question/Bug
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
Create a gstreamer pipeline in a C++ application that uses a sample configuration from deepstream5.0/samples/configs/deepstream-app-trtis/

We use a pipeline very similar to this:

 filesrc location=streams/sample_1080p_h264.mp4 
 ! qtdemux 
 ! h264parse 
 ! nvv4l2decoder enable-max-performance=true 
 ! m.sink_0 nvstreammux name=m batch-size=1 width=1920 height=1080 
 ! nvinferserver name=inferserver config-file-path=configs/deepstream-app-trtis/config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt batch-size=1
 ! nvvideoconvert 
 ! nvdsosd 
 ! nvegltransform 
 ! nveglglessink sync=true

Instantiate the pipeline from a string and load the model with a state change

GError *error;
GstElement *pipeline, *inferserver;
pipeline = gst_parse_launch(pipeline_string, &error);
inferserver = gst_bin_get_by_name(GST_BIN(pipeline), "inferserver");
gst_element_set_state (GST_ELEMENT(g_pipeline), GST_STATE_PAUSED);

If you then unload the model, change to a different configuration, and load the new model, then the TRTIS backend will cause the application to crash because you are trying to initialize it again when it is already running.

gst_element_set_state (GST_ELEMENT(inferserver), GST_STATE_NULL);
g_object_set(inferserver, "config-file-path", "configs/deepstream-app-trtis/config_infer_primary_detector_ssd_mobilenet_v1_trailers.txt", NULL);
g_object_set(inferserver, "batch-size", 1, NULL);
gst_element_set_state (GST_ELEMENT(inferserver), GST_STATE_PAUSED);

This results in the following errors:

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

No, you are using deepstream nvinferserver in a wrong way. Unfortunately, nvinferserver is not using triton-inference-server but some lower level interfaces which triton-inference-server also use. They are different things. So what you have seen in https://github.com/triton-inference-server/server/blob/master/docs/model_management.md is not compatible to deepstream nvinferserver.

Okay, thanks for clearing that up. Am I then correct in assuming that it just isn’t possible to reload models in runtime like that? As far as I can tell, it is possible to load a different model in a different nvinferserver element as long as you keep the first element in the PAUSED/PLAYING state. That does mean that you need to hold both models in memory at the same time, but maybe that is just the way it has to be?

Currently it can not.