Model has kind KIND_GPU but no GPUs are available

Description

I tried to adapt codes and model from the following repository (server/docs/examples/jetson/concurrency_and_dynamic_batching at main · triton-inference-server/server · GitHub) to test using C API and Triton Inference Server as a shared library. I launched the compiled people_detection.cc directly with the raw .onnx model of peoplenet. Three unexpected behaviors were observed:

  1. When setting config.pbtxt, instance_group as kind: KIND_GPU, the following message was always shown:
    peoplenet has kind KIND_GPU but no GPUs are available

  2. When setting config.pbtxt, instance_group as kind: KIND_CPU, there was no complaint and inference was carried out successfully (with visual detection result), but GPU memory and utilization were consumed (as monitored from nvidia-smi).

  3. I also noticed that if I compiled the people_detection.cc without TRITON_ENABLE_GPU, the inference still consumed GPU memory and utilization.

I ensured that my gpu was functioning normally.

Environment

TensorRT Version:
GPU Type: RTX2080
Nvidia Driver Version: 535.183.01
CUDA Version: 11.8
CUDNN Version: 8.6.0
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Happy to provide via DM.

Steps To Reproduce

I compiled the libtritonserver.so (gpu enabled) following guidance from the core repository (with slight modifications to meet my system):

cmake -DTRITON_THIRD_PARTY_REPO_TAG=r22.12 -DTRITON_COMMON_REPO_TAG=r22.12 -DTRITON_CORE_HEADERS_ONLY=OFF -DTRITON_ENABLE_LOGGING=ON -DTRITON_ENABLE_STATS=ON -DTRITON_ENABLE_TRACING=ON -DTRITON_ENABLE_NVTX=OFF -DTRITON_ENABLE_MALI_GPU=ON -DTRITON_ENABLE_METRICS=OFF -DTRITON_ENABLE_METRICS_GPU=OFF -DTRITON_ENABLE_METRICS_CPU=OFF -DTRITON_ENABLE_GCS=OFF -DTRITON_ENABLE_S3=OFF -DTRITON_ENABLE_AZURE_STORAGE=OFF -DCMAKE_INSTALL_PREFIX:PATH=pwd/install -DTRITON_ENABLE_GPU=ON …

I also compiled libtriton_onnxruntime.so backend (also with gpu enabled) as follows:

cmake -DCMAKE_INSTALL_PREFIX:PATH=pwd/install -DTRITON_BUILD_ONNXRUNTIME_VERSION=1.12.1 -DTRITON_ONNXRUNTIME_INCLUDE_PATHS=“/path/to/onnxruntime-linux-x64-gpu-1.12.1/include” -DTRITON_ONNXRUNTIME_LIB_PATHS=“/path/to/onnxruntime-linux-x64-gpu-1.12.1/lib” -DTRITON_BACKEND_REPO_TAG=r22.12 -DTRITON_CORE_REPO_TAG=r22.12 -DTRITON_COMMON_REPO_TAG=r22.12 -DTRITON_ENABLE_GPU=ON …

Finally, I adapted peoplenet’s config.pbtxt accordingly by changing the backend and slightly modifying the input&output names:

name: “peoplenet”
backend: “onnxruntime”
max_batch_size: 64
input [
{
name: “input_1:0”
data_type: TYPE_FP32
dims: [ 3, 544, 960 ]
}
]
output [
{
name: “output_bbox/BiasAdd:0”
data_type: TYPE_FP32
dims: [ 12, 34, 60 ]
},
{
name: “output_cov/Sigmoid:0”
data_type: TYPE_FP32
dims: [ 3, 34, 60 ]
}
]
instance_group [
{
count: 2
kind: KIND_GPU
}
]

The rest involved placing and linking correctly relevant shared libraries and model repository with the people_detection.cc script.

Thank you in advance and looking forward to hearing from you.

We have identified the root cause of such errors. It’s a simple mistake made in our people_detection.cc. Compilation of Triton shared library and backends were fine. Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.