Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) Jetson • DeepStream Version 6.1.1 • JetPack Version (valid for Jetson only) 5.0.2 • TensorRT Version 8.4
Hi i am currently working with jetson xavier NX (lenovo se70 16gb) with jetpack 5.0.2 and deepstream-6.1.1.
I want to run the different ds pipelines in the same box for different camera/videos
Unfortunately, i am not allowed to use multi-input for different streams, so i have to create multiple ds pipelines.
All the pipelines use the same person_detection model which takes ~900mb for single pipeline.
is there any way to use the same model instance for all the pipelines with batching either using nfinfer or triton nvinferserver plugins. or using separate triton server in the same machine.
you can use nvinferserver’s grpc mode, there will be a tritonserver loading model once, and all clients can send messages to let server do inference. please refer to the doc and sample\opt\nvidia\deepstream\deepstream\samples\configs\deepstream-app-triton-grpc\source4_1080p_dec_preprocess_infer-resnet_tracker_preprocess_sgie_tiled_display_int8.txt
I have few doubts on using triton along with deepstream.
I have used yolov5m model with nvinfer and using nvinferserver. nvinfer gives 3x better fps and 2x faster than nvinferserver and memory usage was more using triton on jetson xavier NX.
If i use grpc mode the above issue will be solved?
Or is there a way to achieve single model instance for multiple pipelines using nvinfer only.
And the solution you mentioned above will work if i have multiple pipelines running at the same time, will grpc able to handle the multiple continues requests with a single model instance.
please refer to this topic topic, you can use nvinferserver’s capi mode, compared with grpc mode there is no internet interaction.
No. can you share the root cause of “there are some other things that won’t work with multi inputs in a single pipeline.”?
yes, In grpc mode, tritosnever is a server, which will load the model once. the server can support requirements from multilple clients at the same time.
I0828 02:29:46.061933 1426151 pinned_memory_manager.cc:240] Pinned memory pool is created at ‘0x202f0a000’ with size 268435456
I0828 02:29:46.062597 1426151 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
[libprotobuf ERROR /raid/home/gitlab-runner/triton/tritonbuild/tritonserver/build/_deps/repo-third-party-build/grpc-repo/src/grpc/third_party/protobuf/src/google/protobuf/text_format.cc:317] Error parsing text-format inference.ModelConfig: 28:1: Expected identifier, got:
E0828 02:29:46.076042 1426151 model_repository_manager.cc:2071] Poll failed for model directory ‘yolov5m’: failed to read text proto from /home/sensormatic/triton_model_repo/yolov5m/config.pbtxt
I0828 02:29:46.076255 1426151 server.cc:559]
±-----------------±-----+
| Repository Agent | Path |
±-----------------±-----+
±-----------------±-----+
I0828 02:29:46.076486 1426151 server.cc:629]
±------±--------±-------+
| Model | Version | Status |
±------±--------±-------+
±------±--------±-------+
I0828 02:29:46.077180 1426151 server.cc:260] Waiting for in-flight requests to complete.
I0828 02:29:46.077228 1426151 server.cc:276] Timeout 30: Found 0 model versions that have in-flight inferences
I0828 02:29:46.077272 1426151 server.cc:291] All models are stopped, unloading models
I0828 02:29:46.077315 1426151 server.cc:298] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
NvDsInferParseYolo is defined in libnvdsinfer_custom_impl_Yolo.so, which is compiled from C code. parsing box function needs to be customized for yolo model, please refer to opt\nvidia\deepstream\deepstream\sources\objectDetector_Yolo\nvdsinfer_custom_impl_Yolo\nvdsparsebbox_Yolo.cpp.