Regarding when we execute triton server on jetson orin getting an error unable to load model

pandian · June 28, 2024, 12:58pm

• Hardware Platform (Jetson / GPU) -Jetson Orin
• DeepStream Version - 6.4
• JetPack Version (valid for Jetson only) - 6.0+b106
• TensorRT Version - 8.6.2
• NVIDIA GPU Driver Version (valid for GPU only) - 12.2
• Issue Type( questions, new requirements, bugs) - we are try to deploy our model into triton server but getting an error unable to load model and peoplenet_resnet32 model is used so kindly help me to resolve the problem

give right way to implement triton server on jetson orin with cuda
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

jetson@ubuntu:~/server$ sudo docker run --rm --runtime=nvidia --net=host --gpus all -v /home/jetson/server/model_repository:/models nvcr.io/nvidia/tritonserver:23.10-py3 tritonserver --model-repository=/models

=============================
== Triton Inference Server ==

NVIDIA Release 23.10 (build 72127510)
Triton Server Version 2.39.0

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

Failed to detect NVIDIA driver version.

I0628 12:48:49.197002 1 pinned_memory_manager.cc:241] Pinned memory pool is created at ‘0x203eae000’ with size 268435456
I0628 12:48:49.197413 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
E0628 12:48:49.203240 1 model_repository_manager.cc:1309] Poll failed for model directory ‘models’: Invalid model name: Could not determine backend for model ‘models’ with no backend in model configuration. Expected model name of the form ‘model.<backend_name>’.
I0628 12:48:49.205591 1 model_lifecycle.cc:461] loading: peoplenet:1
I0628 12:48:49.289920 1 tensorrt.cc:65] TRITONBACKEND_Initialize: tensorrt
I0628 12:48:49.289977 1 tensorrt.cc:75] Triton TRITONBACKEND API version: 1.16
I0628 12:48:49.289985 1 tensorrt.cc:81] ‘tensorrt’ TRITONBACKEND API version: 1.16
I0628 12:48:49.289992 1 tensorrt.cc:105] backend configuration:
{“cmdline”:{“auto-complete-config”:“true”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}}
I0628 12:48:49.290495 1 tensorrt.cc:231] TRITONBACKEND_ModelInitialize: peoplenet (version 1)
I0628 12:48:49.333198 1 logging.cc:46] Loaded engine size: 22 MiB
E0628 12:48:49.340913 1 logging.cc:40] 6: The engine plan file is not compatible with this version of TensorRT, expecting library version 8.6.1.6 got 8.6.2.3, please rebuild.
E0628 12:48:49.359902 1 logging.cc:40] 2: [engine.cpp::deserializeEngine::951] Error Code 2: Internal Error (Assertion engine->deserialize(start, size, allocator, runtime) failed. )
I0628 12:48:49.364211 1 tensorrt.cc:274] TRITONBACKEND_ModelFinalize: delete model state
E0628 12:48:49.364280 1 model_lifecycle.cc:621] failed to load ‘peoplenet’ version 1: Internal: unable to load plan file to auto complete config: /models/peoplenet/1/model.engine
I0628 12:48:49.364306 1 model_lifecycle.cc:756] failed to load ‘peoplenet’
I0628 12:48:49.364469 1 server.cc:592]
±-----------------±-----+
| Repository Agent | Path |
±-----------------±-----+
±-----------------±-----+

I0628 12:48:49.364541 1 server.cc:619]
±---------±----------------------------------------------------------±--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
±---------±----------------------------------------------------------±--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {“cmdline”:{“auto-complete-config”:“true”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |
±---------±----------------------------------------------------------±--------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0628 12:48:49.364585 1 server.cc:662]
±----------±--------±----------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
±----------±--------±----------------------------------------------------------------------------------------------------------+
| peoplenet | 1 | UNAVAILABLE: Internal: unable to load plan file to auto complete config: /models/peoplenet/1/model.engine |
±----------±--------±----------------------------------------------------------------------------------------------------------+

Driver is unsupported. Must be at least 384.00.
W0628 12:48:49.375665 1 metrics.cc:738] DCGM unable to start: DCGM initialization error
I0628 12:48:49.376266 1 metrics.cc:710] Collecting CPU metrics
I0628 12:48:49.376567 1 tritonserver.cc:2458]
±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.39.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0628 12:48:49.376589 1 server.cc:293] Waiting for in-flight requests to complete.
I0628 12:48:49.376600 1 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I0628 12:48:49.376609 1 server.cc:324] All models are stopped, unloading models
I0628 12:48:49.376615 1 server.cc:331] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
jetson@ubuntu:~/server$ nvidia-smi
Fri Jun 28 18:21:33 2024
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 540.3.0 Driver Version: N/A CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Orin (nvgpu) N/A | N/A N/A | N/A |
| N/A N/A N/A N/A / N/A | Not Supported | N/A N/A |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
±--------------------------------------------------------------------------------------+

fanzh · June 29, 2024, 3:08pm

As the log shown, the engine is not for TRT version 8.6.2.3. please recreate TRT engine.

pandian · July 1, 2024, 5:19am

I would like to know about which trition version will support on deeepstream 7.0 with cuda 12.2 ,tensorrt 8.6.2 and it should be utilize with all gpu for triton server

fanzh · July 1, 2024, 7:50am

please refer to /opt/nvidia/deepstream/deepstream/samples/triton_backend_setup.sh, which will download the corresponding trtion version.

pandian · July 1, 2024, 7:59am

ok if i get any error then i will let you know

pandian · July 2, 2024, 4:43am

jetson@ubuntu:/opt/tritonserver$ sudo CUDA_VISIBLE_DEVICES=0,1,2 /opt/tritonserver/bin/tritonserver --model-repository=/opt/tritonserver/triton_model_repo
I0702 04:42:28.413897 696991 pinned_memory_manager.cc:275] Pinned memory pool is created at ‘0x203eae000’ with size 268435456
I0702 04:42:28.414293 696991 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0702 04:42:28.420352 696991 model_lifecycle.cc:469] loading: peoplenet:1
I0702 04:42:28.507809 696991 tensorrt.cc:65] TRITONBACKEND_Initialize: tensorrt
I0702 04:42:28.507887 696991 tensorrt.cc:75] Triton TRITONBACKEND API version: 1.19
I0702 04:42:28.507901 696991 tensorrt.cc:81] ‘tensorrt’ TRITONBACKEND API version: 1.19
I0702 04:42:28.507914 696991 tensorrt.cc:105] backend configuration:
{“cmdline”:{“auto-complete-config”:“true”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“5.300000”,“default-max-batch-size”:“4”}}
I0702 04:42:28.508507 696991 tensorrt.cc:231] TRITONBACKEND_ModelInitialize: peoplenet (version 1)
I0702 04:42:28.544145 696991 logging.cc:46] Loaded engine size: 22 MiB
W0702 04:42:28.549772 696991 logging.cc:43] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
I0702 04:42:28.580480 696991 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +21, now: CPU 0, GPU 21 (MiB)
I0702 04:42:28.586452 696991 tensorrt.cc:297] TRITONBACKEND_ModelInstanceInitialize: peoplenet_0 (GPU device 0)
I0702 04:42:28.609630 696991 logging.cc:46] Loaded engine size: 22 MiB
W0702 04:42:28.609943 696991 logging.cc:43] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
I0702 04:42:28.644510 696991 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +21, now: CPU 0, GPU 21 (MiB)
I0702 04:42:28.651003 696991 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +37, now: CPU 0, GPU 58 (MiB)
I0702 04:42:28.654293 696991 instance_state.cc:188] Created instance peoplenet_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0702 04:42:28.654712 696991 model_lifecycle.cc:835] successfully loaded ‘peoplenet’
I0702 04:42:28.654854 696991 server.cc:607]
±-----------------±-----+
| Repository Agent | Path |
±-----------------±-----+
±-----------------±-----+

I0702 04:42:28.654954 696991 server.cc:634]
±---------±----------------------------------------------------------±--------------------------------------------------------------------------+
| Backend | Path | Config |
±---------±----------------------------------------------------------±--------------------------------------------------------------------------+
| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {“cmdline”:{“auto-complete-config”:“true”,“backend-directory”:“/opt/trito |
| | | nserver/backends”,“min-compute-capability”:“5.300000”,“default-max-batch- |
| | | size”:“4”}} |
±---------±----------------------------------------------------------±--------------------------------------------------------------------------+

I0702 04:42:28.655311 696991 tritonserver.cc:2538]
±---------------------------------±---------------------------------------------------------------------------------------------------------------+
| Option | Value |
±---------------------------------±---------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.44.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configurati |
| | on system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /opt/tritonserver/triton_model_repo |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 5.3 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
±---------------------------------±---------------------------------------------------------------------------------------------------------------+

I0702 04:42:28.657772 696991 grpc_server.cc:2466] Started GRPCInferenceService at 0.0.0.0:8001
I0702 04:42:28.658135 696991 http_server.cc:4636] Started HTTPService at 0.0.0.0:8000
I0702 04:42:28.700809 696991 http_server.cc:320] Started Metrics Service at 0.0.0.0:8002

after getting this how can i get output of peoplenet algorithm resnet 34

name: “peoplenet”
platform: “tensorrt_plan”
max_batch_size: 3
default_model_filename: “model.engine”
input [
{
name: “input_1:0”
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 544, 960 ]
}
]
output [
{
name: “output_bbox/BiasAdd:0”
data_type: TYPE_FP32
dims: [ 12, 34, 60 ]
},
{
name: “output_cov/Sigmoid:0”
data_type: TYPE_FP32
dims: [ 3, 34, 60 ]
}
]

fanzh · July 2, 2024, 5:11am

from the log, triton server succeeded to load the engine. what do you mean about “after getting this how can i get output of peoplenet algorithm resnet 34”?

pandian · July 2, 2024, 5:14am

need to get bounding box with corresponding labels

pandian · July 2, 2024, 5:15am

output [
{
name: “output_bbox/BiasAdd:0”
data_type: TYPE_FP32
dims: [ 12, 34, 60 ]
},
{
name: “output_cov/Sigmoid:0”
data_type: TYPE_FP32
dims: [ 3, 34, 60 ]
}
]

getting only above layer output

fanzh · July 2, 2024, 5:19am

please refer to this ready-made peoplenet triton sample.

pandian · July 4, 2024, 3:02pm

jetson@ubuntu:~/Documents/tritonserver$ sudo CUDA_VISIBLE_DEVICES=0,1,2 /opt/tritonserver/bin/tritonserver --model-repository=/opt/tritonserver/triton_model_repo
[sudo] password for jetson:
Sorry, try again.
[sudo] password for jetson:
Sorry, try again.
[sudo] password for jetson:
I0704 15:00:11.663173 244283 pinned_memory_manager.cc:275] Pinned memory pool is created at ‘0x203eae000’ with size 268435456
I0704 15:00:11.663761 244283 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0704 15:00:11.674965 244283 model_lifecycle.cc:469] loading: lpd:1
I0704 15:00:11.783901 244283 tensorrt.cc:65] TRITONBACKEND_Initialize: tensorrt
I0704 15:00:11.783989 244283 tensorrt.cc:75] Triton TRITONBACKEND API version: 1.19
I0704 15:00:11.784001 244283 tensorrt.cc:81] ‘tensorrt’ TRITONBACKEND API version: 1.19
I0704 15:00:11.784017 244283 tensorrt.cc:105] backend configuration:
{“cmdline”:{“auto-complete-config”:“true”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“5.300000”,“default-max-batch-size”:“4”}}
I0704 15:00:11.784854 244283 tensorrt.cc:231] TRITONBACKEND_ModelInitialize: lpd (version 1)
I0704 15:00:11.804031 244283 logging.cc:46] Loaded engine size: 4 MiB
W0704 15:00:11.809195 244283 logging.cc:43] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
I0704 15:00:11.882547 244283 logging.cc:46] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +5, GPU +5, now: CPU 57, GPU 3455 (MiB)
I0704 15:00:11.890345 244283 logging.cc:46] [MemUsageChange] Init cuDNN: CPU +2, GPU +0, now: CPU 59, GPU 3455 (MiB)
I0704 15:00:11.893301 244283 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +2, now: CPU 0, GPU 2 (MiB)
I0704 15:00:11.906064 244283 tensorrt.cc:297] TRITONBACKEND_ModelInstanceInitialize: lpd_0_0 (GPU device 0)
I0704 15:00:11.912324 244283 logging.cc:46] Loaded engine size: 4 MiB
W0704 15:00:11.912668 244283 logging.cc:43] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
I0704 15:00:11.958592 244283 logging.cc:46] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 59, GPU 3457 (MiB)
I0704 15:00:11.960080 244283 logging.cc:46] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 59, GPU 3457 (MiB)
I0704 15:00:11.961987 244283 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +2, now: CPU 0, GPU 2 (MiB)
I0704 15:00:11.963173 244283 logging.cc:46] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 50, GPU 3457 (MiB)
I0704 15:00:11.964382 244283 logging.cc:46] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 50, GPU 3457 (MiB)
I0704 15:00:11.988483 244283 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +197, now: CPU 0, GPU 199 (MiB)
I0704 15:00:12.008294 244283 instance_state.cc:188] Created instance lpd_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0704 15:00:12.009279 244283 model_lifecycle.cc:835] successfully loaded ‘lpd’
I0704 15:00:12.009472 244283 server.cc:607]
±-----------------±-----+
| Repository Agent | Path |
±-----------------±-----+
±-----------------±-----+

I0704 15:00:12.009610 244283 server.cc:634]
±---------±-------------------------------±-------------------------------+
| Backend | Path | Config |
±---------±-------------------------------±-------------------------------+
| tensorrt | /opt/tritonserver/backends/ten | {“cmdline”:{“auto-complete-con |
| | sorrt/libtriton_tensorrt.so | fig”:“true”,“backend-directory |
| | | “:”/opt/tritonserver/backends” |
| | | ,“min-compute-capability”:“5.3 |
| | | 00000”,"default-max-batch-size |
| | | ":“4”}} |
| | | |
±---------±-------------------------------±-------------------------------+

I0704 15:00:12.009682 244283 server.cc:677]
±------±--------±-------+
| Model | Version | Status |
±------±--------±-------+
| lpd | 1 | READY |
±------±--------±-------+

I0704 15:00:12.009976 244283 tritonserver.cc:2538]
±---------------------------------±-----------------------------------------+
| Option | Value |
±---------------------------------±-----------------------------------------+
| server_id | triton |
| server_version | 2.44.0 |
| server_extensions | classification sequence model_repository |
| | model_repository(unload_dependents) sch |
| | edule_policy model_configuration system_ |
| | shared_memory cuda_shared_memory binary_ |
| | tensor_data parameters statistics trace |
| | logging |
| model_repository_path[0] | /opt/tritonserver/triton_model_repo |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 5.3 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
±---------------------------------±-----------------------------------------+

I0704 15:00:12.024565 244283 grpc_server.cc:2466] Started GRPCInferenceService at 0.0.0.0:8001
I0704 15:00:12.025143 244283 http_server.cc:4636] Started HTTPService at 0.0.0.0:8000
I0704 15:00:12.077942 244283 http_server.cc:320] Started Metrics Service at 0.0.0.0:8002

name: “lpd”
platform: “tensorrt_plan”
max_batch_size: 16
default_model_filename: “yolov4_tiny_usa_deployable.etlt_b16_gpu0_fp16.engine”

input [
{
name: “Input”
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 480, 640]
}
]

output [
{
name: “BatchedNMS”
data_type: TYPE_INT32
dims: [1]
},

{
name: “BatchedNMS_1”
data_type: TYPE_FP32
dims: [200,4]
},

{
name: “BatchedNMS_2”
data_type: TYPE_FP32
dims: [200]
},

{
name: “BatchedNMS_3”
data_type: TYPE_FP32
dims: [200]
}
]

instance_group [
{
kind: KIND_GPU
count: 1
gpus: 0
}
]

model path
triton_model_repo/lpd/1/yolov4_tiny_usa_deployable.etlt_b16_gpu0_fp16.engine

we are running lpd-yolov4 tiny for number plate detection in triton server but we can able to load model after that we cannot able to get detection of the number can you give solution for this getting an output only number plate exactly

import sys
import numpy as np
import tritonclient.grpc as tritongrpcclient
import cv2

Load and preprocess the image

image_path = “test_image.jpg”
im = cv2.imread(image_path)
im_resized = cv2.resize(im, (640, 480)) # Resize to match model input dimensions
im_batch = np.expand_dims(im_resized.transpose(2, 0, 1), axis=0).astype(np.float32) # Shape: [1, 3, 480, 640]

url = ‘localhost:8001’
model_name = ‘lpd’

try:
triton_client = tritongrpcclient.InferenceServerClient(
url=url,
verbose=False
)
except Exception as e:
print("Failed to connect to Triton server: " + str(e))
sys.exit(1)

inputs =
outputs =

Create input object for the model

input_data = tritongrpcclient.InferInput(‘Input’, im_batch.shape, “FP32”)
input_data.set_data_from_numpy(im_batch)
inputs.append(input_data)

Request specific outputs from the model

outputs.append(tritongrpcclient.InferRequestedOutput(‘BatchedNMS’))
outputs.append(tritongrpcclient.InferRequestedOutput(‘BatchedNMS_1’))
outputs.append(tritongrpcclient.InferRequestedOutput(‘BatchedNMS_2’))
outputs.append(tritongrpcclient.InferRequestedOutput(‘BatchedNMS_3’))

Perform inference

try:
results = triton_client.infer(model_name=model_name,
inputs=inputs,
outputs=outputs)
print(“Inference successful!”)

# Get and print the shapes of output tensors
bboxes = results.as_numpy('BatchedNMS_1')
print("Output 'BatchedNMS_1' shape:", bboxes.shape)

scores = results.as_numpy('BatchedNMS_2')
print("Output 'BatchedNMS_2' shape:", scores.shape)

except Exception as e:
print("Inference failed: " + str(e))
sys.exit(1)
Inference successful!
Output ‘BatchedNMS_1’ shape: (1, 200, 4)
Output ‘BatchedNMS_2’ shape: (1, 200)

fanzh · July 5, 2024, 8:48am

please refer to this ready-made yolov4-tiniy-usa triton sample.

pandian · July 5, 2024, 10:23am

jetson@ubuntu:~/Documents/tritonserver$ ^C
jetson@ubuntu:~/Documents/tritonserver$ ^C
jetson@ubuntu:~/Documents/tritonserver$ sudo CUDA_VISIBLE_DEVICES=0 bin/tritonserver --model-repository=triton_model_repo
I0705 10:19:56.956523 40939 pinned_memory_manager.cc:275] Pinned memory pool is created at ‘0x203eae000’ with size 268435456
I0705 10:19:56.956882 40939 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0705 10:19:56.963250 40939 model_lifecycle.cc:469] loading: us_lpd_yolov4-tiny:1
I0705 10:19:57.036950 40939 tensorrt.cc:65] TRITONBACKEND_Initialize: tensorrt
I0705 10:19:57.037031 40939 tensorrt.cc:75] Triton TRITONBACKEND API version: 1.19
I0705 10:19:57.037046 40939 tensorrt.cc:81] ‘tensorrt’ TRITONBACKEND API version: 1.19
I0705 10:19:57.037058 40939 tensorrt.cc:105] backend configuration:
{“cmdline”:{“auto-complete-config”:“true”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“5.300000”,“default-max-batch-size”:“4”}}
I0705 10:19:57.037518 40939 tensorrt.cc:231] TRITONBACKEND_ModelInitialize: us_lpd_yolov4-tiny (version 1)
I0705 10:19:57.051484 40939 logging.cc:46] Loaded engine size: 4 MiB
W0705 10:19:57.056199 40939 logging.cc:43] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
I0705 10:19:57.136443 40939 logging.cc:46] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +5, GPU +3, now: CPU 57, GPU 4484 (MiB)
I0705 10:19:57.143884 40939 logging.cc:46] [MemUsageChange] Init cuDNN: CPU +2, GPU +0, now: CPU 59, GPU 4484 (MiB)
I0705 10:19:57.146027 40939 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +2, now: CPU 0, GPU 2 (MiB)
I0705 10:19:57.157950 40939 tensorrt.cc:297] TRITONBACKEND_ModelInstanceInitialize: us_lpd_yolov4-tiny_0_0 (GPU device 0)
I0705 10:19:57.164015 40939 logging.cc:46] Loaded engine size: 4 MiB
W0705 10:19:57.164346 40939 logging.cc:43] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
I0705 10:19:57.239660 40939 logging.cc:46] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 59, GPU 4485 (MiB)
I0705 10:19:57.241331 40939 logging.cc:46] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 59, GPU 4485 (MiB)
I0705 10:19:57.243828 40939 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +2, now: CPU 0, GPU 2 (MiB)
I0705 10:19:57.245475 40939 logging.cc:46] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 50, GPU 4485 (MiB)
I0705 10:19:57.247129 40939 logging.cc:46] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 50, GPU 4485 (MiB)
I0705 10:19:57.277186 40939 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +197, now: CPU 0, GPU 199 (MiB)
I0705 10:19:57.280999 40939 instance_state.cc:188] Created instance us_lpd_yolov4-tiny_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0705 10:19:57.281585 40939 model_lifecycle.cc:835] successfully loaded ‘us_lpd_yolov4-tiny’
I0705 10:19:57.281705 40939 server.cc:607]
±-----------------±-----+
| Repository Agent | Path |
±-----------------±-----+
±-----------------±-----+

I0705 10:19:57.281791 40939 server.cc:634]
±---------±----------------------------------------------------------±--------------------------------------------------------------------------+
| Backend | Path | Config |
±---------±----------------------------------------------------------±--------------------------------------------------------------------------+
| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {“cmdline”:{“auto-complete-config”:“true”,“backend-directory”:“/opt/trito |
| | | nserver/backends”,“min-compute-capability”:“5.300000”,“default-max-batch- |
| | | size”:“4”}} |
±---------±----------------------------------------------------------±--------------------------------------------------------------------------+

I0705 10:19:57.282077 40939 tritonserver.cc:2538]
±---------------------------------±---------------------------------------------------------------------------------------------------------------+
| Option | Value |
±---------------------------------±---------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.44.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configurati |
| | on system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | triton_model_repo |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 5.3 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
±---------------------------------±---------------------------------------------------------------------------------------------------------------+

I0705 10:19:57.284679 40939 grpc_server.cc:2466] Started GRPCInferenceService at 0.0.0.0:8001
I0705 10:19:57.285076 40939 http_server.cc:4636] Started HTTPService at 0.0.0.0:8000
I0705 10:19:57.327072 40939 http_server.cc:320] Started Metrics Service at 0.0.0.0:8002

infer_config {
unique_id: 2
gpu_ids: 0
max_batch_size: 4
backend {
triton {
model_name: “us_lpd_yolov4-tiny”
version: -1
grpc {
url: “0.0.0.0:8001”
}
}
}

preprocess {
#network_format: IMAGE_FORMAT_BGR
network_format: IMAGE_FORMAT_RGB
#tensor_order: TENSOR_ORDER_LINEAR
tensor_order: TENSOR_ORDER_NONE
maintain_aspect_ratio: 0
frame_scaling_hw: FRAME_SCALING_HW_DEFAULT
frame_scaling_filter: 1
normalize {
scale_factor: 1
}
}

postprocess {
labelfile_path: “lpd_labels.txt”
detection {
num_detected_classes: 4
custom_parse_bbox_func: “NvDsInferParseCustomBatchedNMSTLT”
}
}
custom_lib {
path: “/opt/nvidia/deepstream/deepstream-7.0/lib/libnvds_infercustomparser.so”
}
}

input_control {
process_mode: PROCESS_MODE_CLIP_OBJECTS
operate_on_gie_id: 1
operate_on_class_ids: [0]
secondary_reinfer_interval: 0
async_mode: false
object_control {
bbox_filter {
min_width: 64
min_height: 64
}
}
}

output_control {
output_tensor_meta: true
}

name: “us_lpd_yolov4-tiny”
platform: “tensorrt_plan”
max_batch_size: 4
default_model_filename: “yolov4_tiny_usa_deployable.etlt_b16_gpu0_fp16.engine”
input [
{
name: “Input”
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 480, 640]
}
]

output [
{
name: “BatchedNMS”
data_type: TYPE_INT32
dims: [1]
},

{
name: “BatchedNMS_1”
data_type: TYPE_FP32
dims: [200, 4]
},

{
name: “BatchedNMS_2”
data_type: TYPE_FP32
dims: [200]
},

{
name: “BatchedNMS_3”
data_type: TYPE_FP32
dims: [200]
}
]

instance_group [
{
kind: KIND_GPU
count: 1
gpus: 0
}
]
import sys
sys.path.append(‘…/’)
from pathlib import Path
from os import environ
import gi
import configparser
import argparse
gi.require_version(‘Gst’, ‘1.0’)
from gi.repository import GLib, Gst
from ctypes import *
import time
import sys
import math
import platform
from common.platform_info import PlatformInfo
from common.bus_call import bus_call
from common.FPS import PERF_DATA

import pyds

no_display = False
silent = False
file_loop = False
perf_data = None
measure_latency = False

MAX_DISPLAY_LEN=64
PGIE_CLASS_ID_VEHICLE = 0
PGIE_CLASS_ID_BICYCLE = 1
PGIE_CLASS_ID_PERSON = 2
PGIE_CLASS_ID_ROADSIGN = 3
MUXER_OUTPUT_WIDTH=1920
MUXER_OUTPUT_HEIGHT=1080
MUXER_BATCH_TIMEOUT_USEC = 33000
TILED_OUTPUT_WIDTH=1280
TILED_OUTPUT_HEIGHT=720
GST_CAPS_FEATURES_NVMM=“memory:NVMM”
OSD_PROCESS_MODE= 0
OSD_DISPLAY_TEXT= 1
pgie_classes_str= [“Vehicle”, “TwoWheeler”, “Person”,“RoadSign”]

pgie_src_pad_buffer_probe will extract metadata received on tiler sink pad

and update params for drawing rectangle, object information etc.

def pgie_src_pad_buffer_probe(pad,info,u_data):
frame_number=0
num_rects=0
got_fps = False
gst_buffer = info.get_buffer()
if not gst_buffer:
print("Unable to get GstBuffer ")
return
# Retrieve batch metadata from the gst_buffer
# Note that pyds.gst_buffer_get_nvds_batch_meta() expects the
# C address of gst_buffer as input, which is obtained with hash(gst_buffer)

# Enable latency measurement via probe if environment variable NVDS_ENABLE_LATENCY_MEASUREMENT=1 is set.
# To enable component level latency measurement, please set environment variable
# NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1 in addition to the above.
global measure_latency
if measure_latency:
    num_sources_in_batch = pyds.nvds_measure_buffer_latency(hash(gst_buffer))
    if num_sources_in_batch == 0:
        print("Unable to get number of sources in GstBuffer for latency measurement")

batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
l_frame = batch_meta.frame_meta_list
while l_frame is not None:
    try:
        # Note that l_frame.data needs a cast to pyds.NvDsFrameMeta
        # The casting is done by pyds.NvDsFrameMeta.cast()
        # The casting also keeps ownership of the underlying memory
        # in the C code, so the Python garbage collector will leave
        # it alone.
        frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
    except StopIteration:
        break

    frame_number=frame_meta.frame_num
    l_obj=frame_meta.obj_meta_list
    num_rects = frame_meta.num_obj_meta
    obj_counter = {
    PGIE_CLASS_ID_VEHICLE:0,
    PGIE_CLASS_ID_PERSON:0,
    PGIE_CLASS_ID_BICYCLE:0,
    PGIE_CLASS_ID_ROADSIGN:0
    }
    while l_obj is not None:
        try: 
            # Casting l_obj.data to pyds.NvDsObjectMeta
            obj_meta=pyds.NvDsObjectMeta.cast(l_obj.data)
        except StopIteration:
            break
        obj_counter[obj_meta.class_id] += 1
        try: 
            l_obj=l_obj.next
        except StopIteration:
            break
    if not silent:
        print("Frame Number=", frame_number, "Number of Objects=",num_rects,"Vehicle_count=",obj_counter[PGIE_CLASS_ID_VEHICLE],"Person_count=",obj_counter[PGIE_CLASS_ID_PERSON])

    # Update frame rate through this probe
    stream_index = "stream{0}".format(frame_meta.pad_index)
    global perf_data
    perf_data.update_fps(stream_index)

    try:
        l_frame=l_frame.next
    except StopIteration:
        break

return Gst.PadProbeReturn.OK

def cb_newpad(decodebin, decoder_src_pad,data):
print(“In cb_newpad\n”)
caps=decoder_src_pad.get_current_caps()
if not caps:
caps = decoder_src_pad.query_caps()
gststruct=caps.get_structure(0)
gstname=gststruct.get_name()
source_bin=data
features=caps.get_features(0)

# Need to check if the pad created by the decodebin is for video and not
# audio.
print("gstname=",gstname)
if(gstname.find("video")!=-1):
    # Link the decodebin pad only if decodebin has picked nvidia
    # decoder plugin nvdec_*. We do this by checking if the pad caps contain
    # NVMM memory features.
    print("features=",features)
    if features.contains("memory:NVMM"):
        # Get the source bin ghost pad
        bin_ghost_pad=source_bin.get_static_pad("src")
        if not bin_ghost_pad.set_target(decoder_src_pad):
            sys.stderr.write("Failed to link decoder src pad to source bin ghost pad\n")
    else:
        sys.stderr.write(" Error: Decodebin did not pick nvidia decoder plugin.\n")

def decodebin_child_added(child_proxy,Object,name,user_data):
print(“Decodebin child added:”, name, “\n”)
if(name.find(“decodebin”) != -1):
Object.connect(“child-added”,decodebin_child_added,user_data)

if "source" in name:
    source_element = child_proxy.get_by_name("source")
    if source_element.find_property('drop-on-latency') != None:
        Object.set_property("drop-on-latency", True)

def create_source_bin(index,uri):
print(“Creating source bin”)

# Create a source GstBin to abstract this bin's content from the rest of the
# pipeline
bin_name="source-bin-%02d" %index
print(bin_name)
nbin=Gst.Bin.new(bin_name)
if not nbin:
    sys.stderr.write(" Unable to create source bin \n")

# Source element for reading from the uri.
# We will use decodebin and let it figure out the container format of the
# stream and the codec and plug the appropriate demux and decode plugins.
if file_loop:
    # use nvurisrcbin to enable file-loop
    uri_decode_bin=Gst.ElementFactory.make("nvurisrcbin", "uri-decode-bin")
    uri_decode_bin.set_property("file-loop", 1)
    uri_decode_bin.set_property("cudadec-memtype", 0)
else:
    uri_decode_bin=Gst.ElementFactory.make("uridecodebin", "uri-decode-bin")
if not uri_decode_bin:
    sys.stderr.write(" Unable to create uri decode bin \n")
# We set the input uri to the source element
uri_decode_bin.set_property("uri",uri)
# Connect to the "pad-added" signal of the decodebin which generates a
# callback once a new pad for raw data has beed created by the decodebin
uri_decode_bin.connect("pad-added",cb_newpad,nbin)
uri_decode_bin.connect("child-added",decodebin_child_added,nbin)

# We need to create a ghost pad for the source bin which will act as a proxy
# for the video decoder src pad. The ghost pad will not have a target right
# now. Once the decode bin creates the video decoder and generates the
# cb_newpad callback, we will set the ghost pad target to the video decoder
# src pad.
Gst.Bin.add(nbin,uri_decode_bin)
bin_pad=nbin.add_pad(Gst.GhostPad.new_no_target("src",Gst.PadDirection.SRC))
if not bin_pad:
    sys.stderr.write(" Failed to add ghost pad in source bin \n")
    return None
return nbin

def main(args, config=None, disable_probe=False):
global perf_data
perf_data = PERF_DATA(len(args))

number_sources=len(args)

platform_info = PlatformInfo()
# Standard GStreamer initialization
Gst.init(None)

# Create gstreamer elements */
# Create Pipeline element that will form a connection of other elements
print("Creating Pipeline \n ")
pipeline = Gst.Pipeline()
is_live = False

if not pipeline:
    sys.stderr.write(" Unable to create Pipeline \n")
print("Creating streamux \n ")

# Create nvstreammux instance to form batches from one or more sources.
streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
if not streammux:
    sys.stderr.write(" Unable to create NvStreamMux \n")

pipeline.add(streammux)
for i in range(number_sources):
    print("Creating source_bin ",i," \n ")
    uri_name=args[i]
    if uri_name.find("rtsp://") == 0 :
        is_live = True
    source_bin=create_source_bin(i, uri_name)
    if not source_bin:
        sys.stderr.write("Unable to create source bin \n")
    pipeline.add(source_bin)
    padname="sink_%u" %i
    sinkpad= streammux.request_pad_simple(padname) 
    if not sinkpad:
        sys.stderr.write("Unable to create sink pad bin \n")
    srcpad=source_bin.get_static_pad("src")
    if not srcpad:
        sys.stderr.write("Unable to create src pad bin \n")
    srcpad.link(sinkpad)
queue1=Gst.ElementFactory.make("queue","queue1")
queue2=Gst.ElementFactory.make("queue","queue2")
queue3=Gst.ElementFactory.make("queue","queue3")
queue4=Gst.ElementFactory.make("queue","queue4")
queue5=Gst.ElementFactory.make("queue","queue5")
pipeline.add(queue1)
pipeline.add(queue2)
pipeline.add(queue3)
pipeline.add(queue4)
pipeline.add(queue5)

nvdslogger = None
requested_pgie = "nvinferserver-grpc"

print("Creating Pgie \n ")
if requested_pgie != None and (requested_pgie == 'nvinferserver' or requested_pgie == 'nvinferserver-grpc') :
    pgie = Gst.ElementFactory.make("nvinferserver", "primary-inference")
elif requested_pgie != None and requested_pgie == 'nvinfer':
    pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
else:
    pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")

if not pgie:
    sys.stderr.write(" Unable to create pgie :  %s\n" % requested_pgie)

if disable_probe:
    # Use nvdslogger for perf measurement instead of probe function
    print ("Creating nvdslogger \n")
    nvdslogger = Gst.ElementFactory.make("nvdslogger", "nvdslogger")

print("Creating tiler \n ")
tiler=Gst.ElementFactory.make("nvmultistreamtiler", "nvtiler")
if not tiler:
    sys.stderr.write(" Unable to create tiler \n")
print("Creating nvvidconv \n ")
nvvidconv = Gst.ElementFactory.make("nvvideoconvert", "convertor")
if not nvvidconv:
    sys.stderr.write(" Unable to create nvvidconv \n")
print("Creating nvosd \n ")
nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
if not nvosd:
    sys.stderr.write(" Unable to create nvosd \n")
nvosd.set_property('process-mode',OSD_PROCESS_MODE)
nvosd.set_property('display-text',OSD_DISPLAY_TEXT)

if file_loop:
    if platform_info.is_integrated_gpu():
        # Set nvbuf-memory-type=4 for integrated gpu for file-loop (nvurisrcbin case)
        streammux.set_property('nvbuf-memory-type', 4)
    else:
        # Set nvbuf-memory-type=2 for x86 for file-loop (nvurisrcbin case)
        streammux.set_property('nvbuf-memory-type', 2)

if no_display:
    print("Creating Fakesink \n")
    sink = Gst.ElementFactory.make("fakesink", "fakesink")
    sink.set_property('enable-last-sample', 0)
    sink.set_property('sync', 0)
else:
    if platform_info.is_integrated_gpu():
        print("Creating nv3dsink \n")
        sink = Gst.ElementFactory.make("nv3dsink", "nv3d-sink")
        if not sink:
            sys.stderr.write(" Unable to create nv3dsink \n")
    else:
        if platform_info.is_platform_aarch64():
            print("Creating nv3dsink \n")
            sink = Gst.ElementFactory.make("nv3dsink", "nv3d-sink")
        else:
            print("Creating EGLSink \n")
            sink = Gst.ElementFactory.make("nveglglessink", "nvvideo-renderer")
        if not sink:
            sys.stderr.write(" Unable to create egl sink \n")

if not sink:
    sys.stderr.write(" Unable to create sink element \n")

if is_live:
    print("At least one of the sources is live")
    streammux.set_property('live-source', 1)

streammux.set_property('width', 1920)
streammux.set_property('height', 1080)
streammux.set_property('batch-size', number_sources)
streammux.set_property('batched-push-timeout', MUXER_BATCH_TIMEOUT_USEC)
if requested_pgie == "nvinferserver" and config != None:
    pgie.set_property('config-file-path', config)
elif requested_pgie == "nvinferserver-grpc":
    print("nvinferserver-grpc satisfied")
    pgie.set_property('config-file-path', "config_triton_grpc_infer_primary_lpd.txt")
elif requested_pgie == "nvinfer" and config != None:
    pgie.set_property('config-file-path', config)
else:
    pgie.set_property('config-file-path', "dstest3_pgie_config.txt")
pgie_batch_size=pgie.get_property("batch-size")
if(pgie_batch_size != number_sources):
    print("WARNING: Overriding infer-config batch-size",pgie_batch_size," with number of sources ", number_sources," \n")
    pgie.set_property("batch-size",number_sources)
tiler_rows=int(math.sqrt(number_sources))
tiler_columns=int(math.ceil((1.0*number_sources)/tiler_rows))
tiler.set_property("rows",tiler_rows)
tiler.set_property("columns",tiler_columns)
tiler.set_property("width", TILED_OUTPUT_WIDTH)
tiler.set_property("height", TILED_OUTPUT_HEIGHT)
sink.set_property("qos",0)

print("Adding elements to Pipeline \n")
pipeline.add(pgie)
if nvdslogger:
    pipeline.add(nvdslogger)
pipeline.add(tiler)
pipeline.add(nvvidconv)
pipeline.add(nvosd)
pipeline.add(sink)

print("Linking elements in the Pipeline \n")
streammux.link(queue1)
queue1.link(pgie)
pgie.link(queue2)
if nvdslogger:
    queue2.link(nvdslogger)
    nvdslogger.link(tiler)
else:
    queue2.link(tiler)
tiler.link(queue3)
queue3.link(nvvidconv)
nvvidconv.link(queue4)
queue4.link(nvosd)
nvosd.link(queue5)
queue5.link(sink)   

# create an event loop and feed gstreamer bus mesages to it
loop = GLib.MainLoop()
bus = pipeline.get_bus()
bus.add_signal_watch()
bus.connect ("message", bus_call, loop)
pgie_src_pad=pgie.get_static_pad("src")
if not pgie_src_pad:
    sys.stderr.write(" Unable to get src pad \n")
else:
    if not disable_probe:
        pgie_src_pad.add_probe(Gst.PadProbeType.BUFFER, pgie_src_pad_buffer_probe, 0)
        # perf callback function to print fps every 5 sec
        GLib.timeout_add(5000, perf_data.perf_print_callback)

# Enable latency measurement via probe if environment variable NVDS_ENABLE_LATENCY_MEASUREMENT=1 is set.
# To enable component level latency measurement, please set environment variable
# NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1 in addition to the above.
if environ.get('NVDS_ENABLE_LATENCY_MEASUREMENT') == '1':
    print ("Pipeline Latency Measurement enabled!\nPlease set env var NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1 for Component Latency Measurement")
    global measure_latency
    measure_latency = True

# List the sources
print("Now playing...")
for i, source in enumerate(args):
    print(i, ": ", source)

print("Starting pipeline \n")
# start play back and listed to events		
pipeline.set_state(Gst.State.PLAYING)
try:
    loop.run()
except:
    pass
# cleanup
print("Exiting app\n")
pipeline.set_state(Gst.State.NULL)

def parse_args():

parser = argparse.ArgumentParser(prog="deepstream_test_3",
                description="deepstream-test3 multi stream, multi model inference reference app")
parser.add_argument(
    "-i",
    "--input",
    help="Path to input streams",
    nargs="+",
    metavar="URIs",
    default=["a"],
    required=True,
)
# parser.add_argument(
#     "-c",
#     "--configfile",
#     metavar="config_location.txt",
#     default=None,
#     help="Choose the config-file to be used with specified pgie",
# )
# parser.add_argument(
#     "-g",
#     "--pgie",
#     default=None,
#     help="Choose Primary GPU Inference Engine",
#     choices=["nvinfer", "nvinferserver", "nvinferserver-grpc"],
# )
# parser.add_argument(
#     "--no-display",
#     action="store_true",
#     default=False,
#     dest='no_display',
#     help="Disable display of video output",
# )
# parser.add_argument(
#     "--file-loop",
#     action="store_true",
#     default=False,
#     dest='file_loop',
#     help="Loop the input file sources after EOS",
# )
# parser.add_argument(
#     "--disable-probe",
#     action="store_true",
#     default=False,
#     dest='disable_probe',
#     help="Disable the probe function and use nvdslogger for FPS",
# )
# parser.add_argument(
#     "-s",
#     "--silent",
#     action="store_true",
#     default=False,
#     dest='silent',
#     help="Disable verbose output",
# )
# Check input arguments
if len(sys.argv) == 1:
    parser.print_help(sys.stderr)
    sys.exit(1)
args = parser.parse_args()

stream_paths = args.input
# pgie = args.pgie
# config = args.configfile
# disable_probe = args.disable_probe
# global no_display
# global silent
# global file_loop
# no_display = args.no_display
# silent = args.silent
# file_loop = args.file_loop

# if config and not pgie or pgie and not config:
#     sys.stderr.write ("\nEither pgie or configfile is missing. Please specify both! Exiting...\n\n\n\n")
#     parser.print_help()
#     sys.exit(1)
# if config:
#     config_path = Path(config)
#     if not config_path.is_file():
#         sys.stderr.write ("Specified config-file: %s doesn't exist. Exiting...\n\n" % config)
#         sys.exit(1)

# print(vars(args))
return stream_paths

if name == ‘main’:
stream_paths = parse_args()
sys.exit(main(stream_paths))

not able to get detection output

fanzh · July 5, 2024, 3:08pm

if you are using triton grpc mode. here is the ready-made configurations.
from the logs, tritonserver succeed to load model. but there is no any interaction log. could you share the log of client side? you can add log in NvDsInferParseCustomBatchedNMSTLT to check if this function is called.

pandian · July 8, 2024, 5:05am

could you give some information about how to set up triton server in docker in depstream-7 ,cuda-12.2 and is it possible to give input image to lpd model after can we get output of the this.

fanzh · July 8, 2024, 5:18am

you can uncomment “rm -rf $TRITON_DOWNLOADS” in /opt/nvidia/deepstream/deepstream/samples/triton_backend_setup.sh first, then execute this script. you can find tritonserver by “find / -name “tritonserver” 2>/dev/null”.

pandian · July 8, 2024, 5:25am

is it possible to give input image to lpd model after can we get output of the this. could you give for this?

fanzh · July 8, 2024, 5:32am

please refer to my comment on July 5. nvinfersever configuration for lpd is ready-made. you can replace the dstest1_pgie_nvinferserver_config.txt with lpd_yolov4-tiny_us.txt in deepstream-test1 sample for test.

yingliu · July 30, 2024, 3:41am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · August 13, 2024, 3:41am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
`Error No Op registered for NMSDynamic_TRT...` when trying to run Trition inference server with a SSD model TAO Toolkit jetson	12	1245	October 12, 2023
DeepStream 6.0.1 Triton GRPC memory leak DeepStream SDK nvbugs	23	2761	September 2, 2022
Deepstream with triton is stuck and not outputting anything DeepStream SDK inference-server-triton , inception	5	1018	September 19, 2022
Triton server for squad model on P100 with TensorRT 6.0 Triton Inference Server - archived	0	893	June 23, 2020
Tao-converted .plan model running in triton-server turned to bad accurate TAO Toolkit	46	3554	April 1, 2022
I can't run deepstream-lidar-inference-app on jetson nano. It will report an error! DeepStream SDK	11	365	September 28, 2023
Custom Detection parser error with nvinferserver and custom python model with > 1 streams DeepStream SDK inference-server-triton , gpu , deepstream	18	1102	September 4, 2023
Triton inference server is sending back "HTTP/1.1 400 Bad Request" TAO Toolkit	6	3419	October 12, 2021
Triton Server Error with TAO FasterRCNN model: Validation failed: libNamespace == nullptr TAO Toolkit	12	55	April 1, 2025
MPI error after loading TensorRT engines on Triton TensorRT cudnn , inference-server-triton , gemma-2-9b-it	1	316	December 31, 2024