Unable to run Triton example

(base) utkarsh@utkarsh:~$ docker run --gpus=1 --rm --net=host --privileged -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.03-py3 tritonserver --model-repository=/models~

=============================
== Triton Inference Server ==

NVIDIA Release 24.03 (build 86102629)
Triton Server Version 2.44.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

WARNING: CUDA Minor Version Compatibility mode ENABLED.
Using driver version 535.161.08 which has support for CUDA 12.2. This container
was built with CUDA 12.4 and will be run in Minor Version Compatibility mode.
CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
with this container but was unavailable:
[[Forward compatibility was attempted on non supported HW (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) cuInit()=804]]
See CUDA Compatibility :: NVIDIA GPU Management and Deployment Documentation for details.

I0409 19:59:44.619329 1 pinned_memory_manager.cc:275] Pinned memory pool is created at ‘0x7f7568000000’ with size 268435456
I0409 19:59:44.619610 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0409 19:59:44.654197 1 metrics.cc:877] Collecting metrics for GPU 0: NVIDIA GeForce GTX 1650
I0409 19:59:44.656117 1 metrics.cc:770] Collecting CPU metrics
I0409 19:59:44.656246 1 tritonserver.cc:2538]
±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.44.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /models~ |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0409 19:59:44.656255 1 server.cc:304] No server context available. Exiting immediately.
error: creating server: Internal - failed to stat file /models~
W0409 19:59:45.657588 1 metrics.cc:631] Unable to get power limit for GPU 0. Status:Success, value:0.000000
(base) utkarsh@utkarsh:~$ docker run -it --rm --net=host --privileged nvcr.io/nvidia/tritonserver:24.03-py3-sdk

=================================
== Triton Inference Server SDK ==

NVIDIA Release 24.03 (build 86102633)

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
NVIDIA Cloud Native Technologies - NVIDIA Docs .

root@utkarsh:/workspace# /workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
error: failed to get model metadata: HTTP client failed: Couldn’t connect to server
error: failed to parse model metadata: failed to parse JSON at0: The document is empty.
error: failed to get model config: HTTP client failed: Couldn’t connect to server
error: failed to parse model config: failed to parse JSON at0: The document is empty.
expecting 1 input, got 0
root@utkarsh:/workspace#

Code I use is:

Step 1: Create the example model repository

git clone -b r24.03 GitHub - triton-inference-server/server: The Triton Inference Server provides an optimized cloud and edge inferencing solution.
cd server/docs/examples
./fetch_models.sh

Step 2: Launch triton from the NGC Triton container

docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.03-py3 tritonserver --model-repository=/models

Step 3: Sending an Inference Request

In a separate console, launch the image_client example from the NGC Triton SDK container

docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:24.03-py3-sdk
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg

Inference should return the following

Image ‘/workspace/images/mug.jpg’:
15.346230 (504) = COFFEE MUG
13.224326 (968) = CUP
10.422965 (505) = COFFEEPOT

Note: I aready Created the example model repository