Running single model instance across multiple pipelines

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson
• DeepStream Version 6.1.1
• JetPack Version (valid for Jetson only) 5.0.2
• TensorRT Version 8.4

Hi i am currently working with jetson xavier NX (lenovo se70 16gb) with jetpack 5.0.2 and deepstream-6.1.1.

  1. I want to run the different ds pipelines in the same box for different camera/videos
  2. Unfortunately, i am not allowed to use multi-input for different streams, so i have to create multiple ds pipelines.
  3. All the pipelines use the same person_detection model which takes ~900mb for single pipeline.
  4. is there any way to use the same model instance for all the pipelines with batching either using nfinfer or triton nvinferserver plugins. or using separate triton server in the same machine.

why can’t the application use multi-input? deepstream-test3 can accept multiple inputs in one pipeline.

there are some other things that won’t work with multi inputs in a single pipeline.

you can start two applications or use multiple pipelines in one application.

How do i start two applications which uses the single model instance.

you can use nvinferserver’s grpc mode, there will be a tritonserver loading model once, and all clients can send messages to let server do inference. please refer to the doc and sample\opt\nvidia\deepstream\deepstream\samples\configs\deepstream-app-triton-grpc\source4_1080p_dec_preprocess_infer-resnet_tracker_preprocess_sgie_tiled_display_int8.txt

@fanzh thanks for the suggestion.

I have few doubts on using triton along with deepstream.

  1. I have used yolov5m model with nvinfer and using nvinferserver. nvinfer gives 3x better fps and 2x faster than nvinferserver and memory usage was more using triton on jetson xavier NX.
  2. If i use grpc mode the above issue will be solved?
  3. Or is there a way to achieve single model instance for multiple pipelines using nvinfer only.
  4. And the solution you mentioned above will work if i have multiple pipelines running at the same time, will grpc able to handle the multiple continues requests with a single model instance.

please refer to this topic topic, you can use nvinferserver’s capi mode, compared with grpc mode there is no internet interaction.

No. can you share the root cause of “there are some other things that won’t work with multi inputs in a single pipeline.”?

yes, In grpc mode, tritosnever is a server, which will load the model once. the server can support requirements from multilple clients at the same time.

we cannot use multi-inputs, because of our architecture issues.

Hi have tried to run triton server with yolov5m model.
Got the below error, the same works with nvinferserver in deepstream.

config.pbtxt

name: “yolov5m”
platform: “tensorrt_plan”
max_batch_size : 4
default_model_filename: “model_b4_gpu0_fp32.engine”
input [
{
name: “data”
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [3, 640, 640]
}
]

$/tritonserver --model-repository=/home/sensormatic/triton_model_repo --backend-directory=/opt/nvidia/deepstream/deepstream/lib/triton_backends/

I0828 02:29:46.061933 1426151 pinned_memory_manager.cc:240] Pinned memory pool is created at ‘0x202f0a000’ with size 268435456
I0828 02:29:46.062597 1426151 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
[libprotobuf ERROR /raid/home/gitlab-runner/triton/tritonbuild/tritonserver/build/_deps/repo-third-party-build/grpc-repo/src/grpc/third_party/protobuf/src/google/protobuf/text_format.cc:317] Error parsing text-format inference.ModelConfig: 28:1: Expected identifier, got:
E0828 02:29:46.076042 1426151 model_repository_manager.cc:2071] Poll failed for model directory ‘yolov5m’: failed to read text proto from /home/sensormatic/triton_model_repo/yolov5m/config.pbtxt
I0828 02:29:46.076255 1426151 server.cc:559]
±-----------------±-----+
| Repository Agent | Path |
±-----------------±-----+
±-----------------±-----+

I0828 02:29:46.076360 1426151 server.cc:586]
±--------±-----±-------+
| Backend | Path | Config |
±--------±-----±-------+
±--------±-----±-------+

I0828 02:29:46.076486 1426151 server.cc:629]
±------±--------±-------+
| Model | Version | Status |
±------±--------±-------+
±------±--------±-------+

W0828 02:29:46.076558 1426151 metrics.cc:324] Neither cache metrics nor gpu metrics are enabled. Not polling for them.
I0828 02:29:46.076976 1426151 tritonserver.cc:2176]
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.24.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tens |
| | or_data statistics trace |
| model_repository_path[0] | /home/sensormatic/triton_model_repo |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 5.3 |
| strict_readiness | 1 |
| exit_timeout | 30 |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0828 02:29:46.077180 1426151 server.cc:260] Waiting for in-flight requests to complete.
I0828 02:29:46.077228 1426151 server.cc:276] Timeout 30: Found 0 model versions that have in-flight inferences
I0828 02:29:46.077272 1426151 server.cc:291] All models are stopped, unloading models
I0828 02:29:46.077315 1426151 server.cc:298] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

it is the error tip. please refer to a yolov5 model cfg sample and the doc.

@fanzh thank you i am able to run the triton server now using

$ LD_PRELOAD=/home/user/Neo/deepstream_python_apps/apps/deepstream-occupancy/person-head-detection/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so ./tritonserver --model-repository=/home/user/triton_model_repo --backend-directory=/opt/nvidia/deepstream/deepstream/lib/triton_backends/ --allow-grpc=1

@fanzh running deepstream app with nvinferserver with grpc mode is not giving any detection, when an object appears it is giving errror

0:00:23.925216093 2033209 0xfffee0079240 DEBUG v4l2bufferpool gstv4l2bufferpool.c:2077:gst_v4l2_buffer_pool_process:<nvv4l2decoder0:pool:sink> process buffer 0xfffec80151a8Segmentation fault (core dumped)

can you use gdb to debug? what is the call stack of the crash?

@fanzh i am using python3

  1. can you share the whole pipeline and configuration file of nvinfer or nvinferserver?
  2. Deepstream SDK is C code, you can try gdb python3 to check if it crashed in C code.

@fanzh please find the python3 pipeline code and nvinfer, nvinferserver config file.

python3 code
occupancy.py (20.0 KB)

nvanalytics config
config_nvdsanalytics.txt (3.6 KB)

(this is giving me the outputs) triton config without running triton server as individual service
yolov5_pgie_nvinferserver_config.txt (2.1 KB)

triton config with running triton as individual service with GRPC mode.
yolov5_pgie_nvinferserver_grpc_config.txt (2.0 KB)

did you see the bounding box? you can add log in NvDsInferParseYolo to check it crashed in this function.

actually i am not familiar with CPP code @fanzh

NvDsInferParseYolo is defined in libnvdsinfer_custom_impl_Yolo.so, which is compiled from C code. parsing box function needs to be customized for yolo model, please refer to opt\nvidia\deepstream\deepstream\sources\objectDetector_Yolo\nvdsinfer_custom_impl_Yolo\nvdsparsebbox_Yolo.cpp.