Dynamically loading/unloading multiple models with nvinferserver plugin in gstreamer application

kmolha · January 20, 2021, 2:40pm

Hi,

We are developing a C++ Deepstream application using Triton Inference Server, where we would like to be able to only have one model loaded at the time, but still be able to change to a different one at runtime with a service call. The issue we are seeing is, that if you ever take the nvinferserver plugin from the PAUSED/PLAYING state to the NULL/READY state for reconfiguration, then the TRTIS backend will get stuck in a bad state that you can’t recover from. The problem seems to be that it will try to initialize the Triton Inference Server every time you bring a nvinferserver plugin to PAUSED or PLAYING, and it just can’t handle that.

Reading https://github.com/triton-inference-server/server/blob/master/docs/model_management.md it seems like it should be possible to reconfigure to a different model runtime, but maybe there are some constraints to the interface through the gstreamer plugin that I am overlooking?

I guess my question is: Is this intended behaviour and am I just misusing the SDK? If so, is there a better way to solve the problem? I tried to condense our approach into an example shown below.

• Hardware Platform (Jetson / GPU)
Jetson TX2
• DeepStream Version
5.0.0
• JetPack Version (valid for Jetson only)
4.4.0
• TensorRT Version
7.1.3.0
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
Question/Bug
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
Create a gstreamer pipeline in a C++ application that uses a sample configuration from deepstream5.0/samples/configs/deepstream-app-trtis/

We use a pipeline very similar to this:

 filesrc location=streams/sample_1080p_h264.mp4 
 ! qtdemux 
 ! h264parse 
 ! nvv4l2decoder enable-max-performance=true 
 ! m.sink_0 nvstreammux name=m batch-size=1 width=1920 height=1080 
 ! nvinferserver name=inferserver config-file-path=configs/deepstream-app-trtis/config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt batch-size=1
 ! nvvideoconvert 
 ! nvdsosd 
 ! nvegltransform 
 ! nveglglessink sync=true

Instantiate the pipeline from a string and load the model with a state change

GError *error;
GstElement *pipeline, *inferserver;
pipeline = gst_parse_launch(pipeline_string, &error);
inferserver = gst_bin_get_by_name(GST_BIN(pipeline), "inferserver");
gst_element_set_state (GST_ELEMENT(g_pipeline), GST_STATE_PAUSED);

If you then unload the model, change to a different configuration, and load the new model, then the TRTIS backend will cause the application to crash because you are trying to initialize it again when it is already running.

gst_element_set_state (GST_ELEMENT(inferserver), GST_STATE_NULL);
g_object_set(inferserver, "config-file-path", "configs/deepstream-app-trtis/config_infer_primary_detector_ssd_mobilenet_v1_trailers.txt", NULL);
g_object_set(inferserver, "batch-size", 1, NULL);
gst_element_set_state (GST_ELEMENT(inferserver), GST_STATE_PAUSED);

This results in the following errors:

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Fiona.Chen · January 21, 2021, 1:43am

No, you are using deepstream nvinferserver in a wrong way. Unfortunately, nvinferserver is not using triton-inference-server but some lower level interfaces which triton-inference-server also use. They are different things. So what you have seen in https://github.com/triton-inference-server/server/blob/master/docs/model_management.md is not compatible to deepstream nvinferserver.

kmolha · January 21, 2021, 8:03am

Okay, thanks for clearing that up. Am I then correct in assuming that it just isn’t possible to reload models in runtime like that? As far as I can tell, it is possible to load a different model in a different nvinferserver element as long as you keep the first element in the PAUSED/PLAYING state. That does mean that you need to hold both models in memory at the same time, but maybe that is just the way it has to be?

Fiona.Chen · January 21, 2021, 8:05am

Currently it can not.

Topic		Replies	Views
One program with multi pipeline，nvinferserver always infer failed DeepStream SDK	3	440	August 25, 2023
Pipeline stuck when using nvinferserver DeepStream SDK	3	424	May 31, 2022
JetsonNano - Using triton inference server via the DeepStream gstreamer plugin DeepStream SDK tensorrt , jetson-inference , gstreamer , inference-server-triton	3	1582	July 20, 2022
Running single model instance across multiple pipelines DeepStream SDK	35	1427	September 1, 2023
Utilizing Inference server for multi-batch processing with deepstream DeepStream SDK gstreamer , inference-server-triton , deepstream61	11	1322	October 19, 2023
Is dynamic nvinfer possible? DeepStream SDK	3	492	May 10, 2022
Example nvinferserver not working DeepStream SDK	3	1774	February 1, 2022
Deepstream nvinfer-server with https endpoint TensorRT	1	398	August 30, 2021
Dynamic Management of Video Sources and nvinfer Plugins for Multi-Model Inference DeepStream SDK	8	474	March 1, 2024
DeepStream Gst-nvinferserver features to run triton inference server DeepStream SDK	3	399	January 9, 2024

Dynamically loading/unloading multiple models with nvinferserver plugin in gstreamer application

Related topics