Model has kind KIND_GPU but no GPUs are available


I tried to adapt codes and model from the following repository (server/docs/examples/jetson/concurrency_and_dynamic_batching at main · triton-inference-server/server · GitHub) to test using C API and Triton Inference Server as a shared library. I launched the compiled directly with the raw .onnx model of peoplenet. Three unexpected behaviors were observed:

  1. When setting config.pbtxt, instance_group as kind: KIND_GPU, the following message was always shown:
    peoplenet has kind KIND_GPU but no GPUs are available

  2. When setting config.pbtxt, instance_group as kind: KIND_CPU, there was no complaint and inference was carried out successfully (with visual detection result), but GPU memory and utilization were consumed (as monitored from nvidia-smi).

  3. I also noticed that if I compiled the without TRITON_ENABLE_GPU, the inference still consumed GPU memory and utilization.

I ensured that my gpu was functioning normally.


TensorRT Version:
GPU Type: RTX2080
Nvidia Driver Version: 535.183.01
CUDA Version: 11.8
CUDNN Version: 8.6.0
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Happy to provide via DM.

Steps To Reproduce

I compiled the (gpu enabled) following guidance from the core repository (with slight modifications to meet my system):


I also compiled backend (also with gpu enabled) as follows:

cmake -DCMAKE_INSTALL_PREFIX:PATH=pwd/install -DTRITON_BUILD_ONNXRUNTIME_VERSION=1.12.1 -DTRITON_ONNXRUNTIME_INCLUDE_PATHS=“/path/to/onnxruntime-linux-x64-gpu-1.12.1/include” -DTRITON_ONNXRUNTIME_LIB_PATHS=“/path/to/onnxruntime-linux-x64-gpu-1.12.1/lib” -DTRITON_BACKEND_REPO_TAG=r22.12 -DTRITON_CORE_REPO_TAG=r22.12 -DTRITON_COMMON_REPO_TAG=r22.12 -DTRITON_ENABLE_GPU=ON …

Finally, I adapted peoplenet’s config.pbtxt accordingly by changing the backend and slightly modifying the input&output names:

name: “peoplenet”
backend: “onnxruntime”
max_batch_size: 64
input [
name: “input_1:0”
data_type: TYPE_FP32
dims: [ 3, 544, 960 ]
output [
name: “output_bbox/BiasAdd:0”
data_type: TYPE_FP32
dims: [ 12, 34, 60 ]
name: “output_cov/Sigmoid:0”
data_type: TYPE_FP32
dims: [ 3, 34, 60 ]
instance_group [
count: 2
kind: KIND_GPU

The rest involved placing and linking correctly relevant shared libraries and model repository with the script.

Thank you in advance and looking forward to hearing from you.

We have identified the root cause of such errors. It’s a simple mistake made in our Compilation of Triton shared library and backends were fine. Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.