Inferencing on DINO in triton inference server

sanujen.20 · August 4, 2024, 2:52pm

Description

I am trying to initialize triton inference server for dino engine file and I am looking for any example references for the dino model on triton inference server, Thanks.

I tried to run the engine with what I found in docs and ended up in the following error.

Environment

GPU Type: Tesla T4
Nvidia Driver Version: 535.171.04
CUDA Version: 12.2
Operating System + Version: Ubuntu 22.04.4 LTS
Python Version (if applicable): 3.10.12
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): tritonserver:24.07-py3

Relevant Files

model repo structure

Steps To Reproduce

Docker run command
sudo docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.07-py3 tritonserver --model-repository=/models

Error thrown,

Complete Error Logs

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 24.07 (build 101814614)
Triton Server Version 2.48.0

Copyright (c) 2018-2024, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 12.5 driver version 555.42.06 with kernel driver version 535.171.04.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

I0804 14:46:20.658159 1 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x70b29c000000' with size 268435456"
I0804 14:46:20.661299 1 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I0804 14:46:20.666066 1 model_lifecycle.cc:472] "loading: dino:1"
I0804 14:46:20.684498 1 tensorrt.cc:65] "TRITONBACKEND_Initialize: tensorrt"
I0804 14:46:20.684531 1 tensorrt.cc:75] "Triton TRITONBACKEND API version: 1.19"
I0804 14:46:20.684537 1 tensorrt.cc:81] "'tensorrt' TRITONBACKEND API version: 1.19"
I0804 14:46:20.684542 1 tensorrt.cc:105] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I0804 14:46:20.687355 1 tensorrt.cc:231] "TRITONBACKEND_ModelInitialize: dino (version 1)"
I0804 14:46:20.816988 1 logging.cc:46] "Loaded engine size: 238 MiB"
E0804 14:46:20.859823 1 logging.cc:40] "IRuntime::deserializeCudaEngine: Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match. Note: Current Version: 238, Serialized Engine Version: 236)"
I0804 14:46:20.863841 1 tensorrt.cc:274] "TRITONBACKEND_ModelFinalize: delete model state"
E0804 14:46:20.863867 1 model_lifecycle.cc:641] "failed to load 'dino' version 1: Internal: unable to load plan file to auto complete config: /models/dino/1/model.plan"
I0804 14:46:20.863878 1 model_lifecycle.cc:776] "failed to load 'dino'"
I0804 14:46:20.863945 1 server.cc:604] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0804 14:46:20.864014 1 server.cc:631] 
+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend  | Path                                                      | Config                                                                                                                                                        |
+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0804 14:46:20.864075 1 server.cc:674] 
+-------+---------+----------------------------------------------------------------------------------------------------+
| Model | Version | Status                                                                                             |
+-------+---------+----------------------------------------------------------------------------------------------------+
| dino  | 1       | UNAVAILABLE: Internal: unable to load plan file to auto complete config: /models/dino/1/model.plan |
+-------+---------+----------------------------------------------------------------------------------------------------+

I0804 14:46:20.896006 1 metrics.cc:877] "Collecting metrics for GPU 0: Tesla T4"
I0804 14:46:20.902783 1 metrics.cc:770] "Collecting CPU metrics"
I0804 14:46:20.902958 1 tritonserver.cc:2598] 
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.48.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /models                                                                                                                                                                                                         |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| model_config_name                |                                                                                                                                                                                                                 |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                                        |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0804 14:46:20.903017 1 server.cc:305] "Waiting for in-flight requests to complete."
I0804 14:46:20.903023 1 server.cc:321] "Timeout 30: Found 0 model versions that have in-flight inferences"
I0804 14:46:20.903064 1 server.cc:336] "All models are stopped, unloading models"
I0804 14:46:20.903095 1 server.cc:345] "Timeout 30: Found 0 live models and 0 in-flight non-inference requests"
error: creating server: Internal - failed to load all models

AakankshaS · August 29, 2024, 8:39pm

Hi @sanujen.20 ,
Apologies for delay, but would recommend raising the issue to Issues · triton-inference-server/server · GitHub
Thanks

Topic		Replies	Views
Unable to run Triton example TensorRT inference-server-triton	1	774	May 31, 2024
Mistral AI Models TensorRT cudnn	1	266	June 25, 2024
Failed to deploy the reference server. Make an inference request to the peoplenet model via http TensorRT cudnn , inference-server-triton , deepstream	1	20	August 29, 2024
`Error No Op registered for NMSDynamic_TRT...` when trying to run Trition inference server with a SSD model TAO Toolkit jetson	12	1196	October 12, 2023
MistralAI models, Mistral-7B, Mistral-7B-Instruct, Mixtral-8x7B, Mixtral-8x7B-Instruct Maxine	0	173	June 17, 2024
Triton infererence server example 'simple_grpc_infer_client.py' DeepStream SDK	11	4856	March 23, 2022
Triton server for squad model on P100 with TensorRT 6.0 Triton Inference Server - archived	0	886	June 23, 2020
Triton Image for jetson nano TAO Toolkit	6	777	July 6, 2022
Triton inference server is sending back "HTTP/1.1 400 Bad Request" TAO Toolkit	6	3344	October 12, 2021
Triton server inference model placement TAO Toolkit	7	952	February 23, 2022

Inferencing on DINO in triton inference server

Description

Environment

Relevant Files

Steps To Reproduce

Related topics