Unable to run Triton example

utkarsh.tiwariblr47 · April 9, 2024, 8:06pm

(base) utkarsh@utkarsh:~$ docker run --gpus=1 --rm --net=host --privileged -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.03-py3 tritonserver --model-repository=/models~

=============================
== Triton Inference Server ==

NVIDIA Release 24.03 (build 86102629)
Triton Server Version 2.44.0

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

WARNING: CUDA Minor Version Compatibility mode ENABLED.
Using driver version 535.161.08 which has support for CUDA 12.2. This container
was built with CUDA 12.4 and will be run in Minor Version Compatibility mode.
CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
with this container but was unavailable:
[[Forward compatibility was attempted on non supported HW (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) cuInit()=804]]
See CUDA Compatibility :: NVIDIA GPU Management and Deployment Documentation for details.

I0409 19:59:44.619329 1 pinned_memory_manager.cc:275] Pinned memory pool is created at ‘0x7f7568000000’ with size 268435456
I0409 19:59:44.619610 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0409 19:59:44.654197 1 metrics.cc:877] Collecting metrics for GPU 0: NVIDIA GeForce GTX 1650
I0409 19:59:44.656117 1 metrics.cc:770] Collecting CPU metrics
I0409 19:59:44.656246 1 tritonserver.cc:2538]
±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.44.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /models~ |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0409 19:59:44.656255 1 server.cc:304] No server context available. Exiting immediately.
error: creating server: Internal - failed to stat file /models~
W0409 19:59:45.657588 1 metrics.cc:631] Unable to get power limit for GPU 0. Status:Success, value:0.000000
(base) utkarsh@utkarsh:~$ docker run -it --rm --net=host --privileged nvcr.io/nvidia/tritonserver:24.03-py3-sdk

=================================
== Triton Inference Server SDK ==

NVIDIA Release 24.03 (build 86102633)

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
NVIDIA Cloud Native Technologies - NVIDIA Docs .

root@utkarsh:/workspace# /workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
error: failed to get model metadata: HTTP client failed: Couldn’t connect to server
error: failed to parse model metadata: failed to parse JSON at0: The document is empty.
error: failed to get model config: HTTP client failed: Couldn’t connect to server
error: failed to parse model config: failed to parse JSON at0: The document is empty.
expecting 1 input, got 0
root@utkarsh:/workspace#

Code I use is:

Step 1: Create the example model repository

git clone -b r24.03 GitHub - triton-inference-server/server: The Triton Inference Server provides an optimized cloud and edge inferencing solution.
cd server/docs/examples
./fetch_models.sh

Step 2: Launch triton from the NGC Triton container

docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.03-py3 tritonserver --model-repository=/models

Step 3: Sending an Inference Request

In a separate console, launch the image_client example from the NGC Triton SDK container

docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:24.03-py3-sdk
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg

Inference should return the following

Image ‘/workspace/images/mug.jpg’:
15.346230 (504) = COFFEE MUG
13.224326 (968) = CUP
10.422965 (505) = COFFEEPOT

Note: I aready Created the example model repository

AakankshaS · May 31, 2024, 4:26am

Hi @utkarsh.tiwariblr47 ,
This looks like a triton related issue.

Topic		Replies	Views
Inferencing on DINO in triton inference server TensorRT inference-server-triton	1	62	August 29, 2024
Mistral AI Models TensorRT cudnn	1	326	June 25, 2024
Triton infererence server example 'simple_grpc_infer_client.py' DeepStream SDK	11	5018	March 23, 2022
Failed to deploy the reference server. Make an inference request to the peoplenet model via http TensorRT cudnn , inference-server-triton , deepstream	1	21	August 29, 2024
Triton server for squad model on P100 with TensorRT 6.0 Triton Inference Server - archived	0	891	June 23, 2020
Triton Inference Server's health status shows 'Connection peer reset' Triton Inference Server - archived inference-server-triton	6	6355	January 18, 2021
ModuleNotFoundError: No module named 'tensorrtserver' TAO Toolkit	10	1718	October 12, 2021
CUDA shared memory doesn't work (failed to open CUDA IPC handle: invalid device context) DeepStream SDK inference-server-triton , deepstream	9	89	April 14, 2025
GPU support with Triton iGPU image and Python Backend Jetson Orin Nano python	9	306	October 14, 2024
Triton server logs DeepStream SDK	7	5205	May 16, 2022