`Error No Op registered for NMSDynamic_TRT...` when trying to run Trition inference server with a SSD model

Please provide the following information when requesting support.

• Hardware Xavier / DGX
• Network Type SSD
• TLT Version Model generated in TAO 5.0.0 (attempted run AGX Xavier (and DGX)/ Docker nvcr.io/nvidia/tritonserver:23.08-py3)

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

  1. export the model as shown in the example notebook (I used a different dataset but the model .onnx file was generated sucessfully )
  2. Set up the model to be set up for the Inference server.
  3. run the command docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:23.08-py3 tritonserver --model-repository=/models

I used a jetson

log

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.08 (build 66821655)
Triton Server Version 2.37.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35
I0925 16:31:17.683765 1 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch
I0925 16:31:17.683949 1 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15
I0925 16:31:17.684002 1 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15
W0925 16:31:17.684197 1 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0925 16:31:17.684262 1 cuda_memory_manager.cc:117] CUDA memory pool disabled
I0925 16:31:17.687395 1 model_lifecycle.cc:462] loading: nozzlenet_onnx:1
I0925 16:31:17.695515 1 onnxruntime.cc:2514] TRITONBACKEND_Initialize: onnxruntime
I0925 16:31:17.695649 1 onnxruntime.cc:2524] Triton TRITONBACKEND API version: 1.15
I0925 16:31:17.695701 1 onnxruntime.cc:2530] 'onnxruntime' TRITONBACKEND API version: 1.15
I0925 16:31:17.695749 1 onnxruntime.cc:2560] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0925 16:31:17.840204 1 onnxruntime.cc:2625] TRITONBACKEND_ModelInitialize: nozzlenet_onnx (version 1)
I0925 16:31:18.070677 1 onnxruntime.cc:2666] TRITONBACKEND_ModelFinalize: delete model state
E0925 16:31:18.070914 1 model_lifecycle.cc:622] failed to load 'nozzlenet_onnx' version 1: Internal: onnx runtime error 10: Load model from /models/nozzlenet_onnx/1/model.onnx failed:This is an invalid model. In Node, ("NMS", NMSDynamic_TRT, "", -1) : ("anchor_data": tensor(float),"loc_data": tensor(float),"conf_data": tensor(float),) -> ("NMS": tensor(float),"NMS_1": tensor(float),) , Error No Op registered for NMSDynamic_TRT with domain_version of 15
I0925 16:31:18.071039 1 model_lifecycle.cc:757] failed to load 'nozzlenet_onnx'
I0925 16:31:18.071599 1 server.cc:604] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0925 16:31:18.072023 1 server.cc:631] 
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend     | Path                                                            | Config                                                                                                                                                        |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}                                                                                                                                                            |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0925 16:31:18.072340 1 server.cc:674] 
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model          | Version | Status                                                                                                                                                                                                                                                                                                                                                                                     |
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| nozzlenet_onnx | 1       | UNAVAILABLE: Internal: onnx runtime error 10: Load model from /models/nozzlenet_onnx/1/model.onnx failed:This is an invalid model. In Node, ("NMS", NMSDynamic_TRT, "", -1) : ("anchor_data": tensor(float),"loc_data": tensor(float),"conf_data": tensor(float),) -> ("NMS": tensor(float),"NMS_1": tensor(float),) , Error No Op registered for NMSDynamic_TRT with domain_version of 15 |
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0925 16:31:18.073954 1 metrics.cc:703] Collecting CPU metrics
I0925 16:31:18.074877 1 tritonserver.cc:2435] 
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.37.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /models                                                                                                                                                                                                         |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0925 16:31:18.075001 1 server.cc:305] Waiting for in-flight requests to complete.
I0925 16:31:18.075051 1 server.cc:321] Timeout 30: Found 0 model versions that have in-flight inferences
I0925 16:31:18.075101 1 server.cc:336] All models are stopped, unloading models
I0925 16:31:18.075144 1 server.cc:343] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

then I used the dgx (istill got the same results)

g@nvdgx:~/Workspace/Triton$ docker run --rm -p 9100:8000 -p 9101:8001 -p 9102:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:23.08-py3 tritonserver --model-repository=/models

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.08 (build 66820947)
Triton Server Version 2.37.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 12.2 driver version 535.86.10 with kernel driver version 470.161.03.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

I0925 17:34:37.150977 1 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch
I0925 17:34:37.151027 1 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15
I0925 17:34:37.151033 1 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15
I0925 17:34:37.311806 1 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7f6724000000' with size 268435456
I0925 17:34:37.316781 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0925 17:34:37.316789 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
I0925 17:34:37.316792 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 2 with size 67108864
I0925 17:34:37.316794 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 3 with size 67108864
I0925 17:34:37.646124 1 model_lifecycle.cc:462] loading: nozzlenet_onnx:1
I0925 17:34:37.647384 1 onnxruntime.cc:2514] TRITONBACKEND_Initialize: onnxruntime
I0925 17:34:37.647402 1 onnxruntime.cc:2524] Triton TRITONBACKEND API version: 1.15
I0925 17:34:37.647407 1 onnxruntime.cc:2530] 'onnxruntime' TRITONBACKEND API version: 1.15
I0925 17:34:37.647410 1 onnxruntime.cc:2560] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0925 17:34:37.669830 1 onnxruntime.cc:2625] TRITONBACKEND_ModelInitialize: nozzlenet_onnx (version 1)
I0925 17:34:37.743687 1 onnxruntime.cc:2666] TRITONBACKEND_ModelFinalize: delete model state
E0925 17:34:37.743723 1 model_lifecycle.cc:622] failed to load 'nozzlenet_onnx' version 1: Internal: onnx runtime error 10: Load model from /models/nozzlenet_onnx/1/model.onnx failed:This is an invalid model. In Node, ("NMS", NMSDynamic_TRT, "", -1) : ("anchor_data": tensor(float),"loc_data": tensor(float),"conf_data": tensor(float),) -> ("NMS": tensor(float),"NMS_1": tensor(float),) , Error No Op registered for NMSDynamic_TRT with domain_version of 15
I0925 17:34:37.743735 1 model_lifecycle.cc:757] failed to load 'nozzlenet_onnx'
I0925 17:34:37.743794 1 server.cc:604] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0925 17:34:37.743853 1 server.cc:631] 
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend     | Path                                                            | Config                                                                                                                                                        |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}                                                                                                                                                            |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0925 17:34:37.743895 1 server.cc:674] 
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model          | Version | Status                                                                                                                                                                                                                                                                                                                                                                                     |
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| nozzlenet_onnx | 1       | UNAVAILABLE: Internal: onnx runtime error 10: Load model from /models/nozzlenet_onnx/1/model.onnx failed:This is an invalid model. In Node, ("NMS", NMSDynamic_TRT, "", -1) : ("anchor_data": tensor(float),"loc_data": tensor(float),"conf_data": tensor(float),) -> ("NMS": tensor(float),"NMS_1": tensor(float),) , Error No Op registered for NMSDynamic_TRT with domain_version of 15 |
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0925 17:34:37.843362 1 metrics.cc:810] Collecting metrics for GPU 0: NVIDIA A100-SXM4-40GB
I0925 17:34:37.843394 1 metrics.cc:810] Collecting metrics for GPU 1: NVIDIA A100-SXM4-40GB
I0925 17:34:37.843400 1 metrics.cc:810] Collecting metrics for GPU 2: NVIDIA A100-SXM4-40GB
I0925 17:34:37.843405 1 metrics.cc:810] Collecting metrics for GPU 3: NVIDIA A100-SXM4-40GB
I0925 17:34:37.845158 1 metrics.cc:703] Collecting CPU metrics
I0925 17:34:37.845356 1 tritonserver.cc:2435] 
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.37.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /models                                                                                                                                                                                                         |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                                        |
| cuda_memory_pool_byte_size{1}    | 67108864                                                                                                                                                                                                        |
| cuda_memory_pool_byte_size{2}    | 67108864                                                                                                                                                                                                        |
| cuda_memory_pool_byte_size{3}    | 67108864                                                                                                                                                                                                        |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0925 17:34:37.845365 1 server.cc:305] Waiting for in-flight requests to complete.
I0925 17:34:37.845373 1 server.cc:321] Timeout 30: Found 0 model versions that have in-flight inferences
I0925 17:34:37.845377 1 server.cc:336] All models are stopped, unloading models
I0925 17:34:37.845380 1 server.cc:343] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

And then I tried with vanilla python with onnx installed

import onnx
ssd_model = onnx.load("model.onnx")
ort_session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])

I got error output

InvalidGraph                              Traceback (most recent call last)
/home/nvidia/Workspace/nozzlenet_pytorch.ipynb Cell 4 line 1
----> 1 ort_session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])

File /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:419, in InferenceSession.__init__(self, path_or_bytes, sess_options, providers, provider_options, **kwargs)
    416 disabled_optimizers = kwargs["disabled_optimizers"] if "disabled_optimizers" in kwargs else None
    418 try:
--> 419     self._create_inference_session(providers, provider_options, disabled_optimizers)
    420 except (ValueError, RuntimeError) as e:
    421     if self._enable_fallback:

File /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:460, in InferenceSession._create_inference_session(self, providers, provider_options, disabled_optimizers)
    458 session_options = self._sess_options if self._sess_options else C.get_default_session_options()
    459 if self._model_path:
--> 460     sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
    461 else:
    462     sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)

InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from model.onnx failed:This is an invalid model. In Node, ("NMS", NMSDynamic_TRT, "", -1) : ("anchor_data": tensor(float),"loc_data": tensor(float),"conf_data": tensor(float),) -> ("NMS": tensor(float),"NMS_1": tensor(float),) , Error No Op registered for NMSDynamic_TRT with domain_version of 15

It looks like in all case the commmon culprit is the NMSDynamic_TRT plugin

How do I install NMSDynamic_TRT plugin for the onnx runtime?

  1. for Jetson
  2. for DGX?

Also I feel like this migtht be a problem in creating .trt engines (on jetson starting from a .onnx file, trt engine gets made in the TAO toolkit in DGX with no problems ) as well.

My aims

  1. to be able to run forward passes of the model within a notebook to get boundung boxes (this then I will convert into a serverless function that will feed back to our labelling workflows (I am open to use triton for inferencing and the serverless function only as a message broker as I have to use the serverless function to be able to plug in the automatic annotator)

  2. generate engine files for deepstream ( I think tesnsorrt in jetson will be able to parse the .onnx file with its inbuiilt plugins (maybe with nvidia OSS and custom parsing lib seperately installed as it was used to be for 4.x jetpacks? (im using ds 6.2)? with jp 5.1/ds 6.2 for this)

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)

• DeepStream Version

• JetPack Version (valid for Jetson only)

• TensorRT Version

• NVIDIA GPU Driver Version (valid for GPU only)

• Issue Type( questions, new requirements, bugs)

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Suggest you to use trtexec which is mentioned in https://github.com/NVIDIA/TensorRT/tree/main/quickstart/deploy_to_triton#step-2-set-up-triton-inference-server, i.e., please generate model.plan(it is tensorrt engine) by using your onnx file. Then config it into triton server.

I’ve altrady done that! (please let me know what bit I need to elaborate on)

You can login the tritonserver docker

docker run -it --gpus all -v /localpath:/dockerpath  nvcr.io/nvidia/tritonserver:23.08-py3 /bin/bash

Inside the docker, generate tensorrt engine.
# trtexec xxx
The command line for SSD can be found in TRTEXEC with SSD - NVIDIA Docs.

Then inspect the engine

polygraphy inspect model model.plan --mode=basic

And config the engine. Also, please note that setting to platform: "tensorrt_plan" according to
https://github.com/NVIDIA/TensorRT/blob/main/quickstart/deploy_to_triton/config.pbtxt#L19.

thanks @Morganh I found trtexec (already installed) and updated my bashrc (added the directory to the path) and now generating the engine!! I will use TRTEXEC with SSD - NVIDIA Docs

can I use the nvinfer_config.txt that was generated in TAO

net-scale-factor=1.0
offsets=103.939;116.779;123.68
infer-dims=3;736;1280
tlt-model-key=tlt_encode
network-type=0
model-color-format=1
maintain-aspect-ratio=0
output-tensor-meta=0

I noticed that just running trtexec (trtexec --onnx=model.onnx --saveEngine=model.plan --explicitBatch --useCudaGraph) takes a very long time! (being running for like 10 minutes, still stuck on)

[09/26/2023-09:37:54] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +82, GPU +132, now: CPU 744, GPU 4261 (MiB)
[09/26/2023-09:37:54] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.

Anyway I will now maybe cancel and follow TRTEXEC with SSD - NVIDIA Docs and then follow the steps onward from

UPDATE: trtexec finished and I got a model.plan file now. will try again with

docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:23.08-py3 tritonserver --model-repository=/models

For example,

trtexec --onnx=/path/to/model.onnx \
        --maxShapes=Input:1x3xheightxwidth \
        --minShapes=Input:1x3xheightxwidth \
        --optShapes=Input:1x3xheightxwidth \
        --fp16 \
        --saveEngine=/path/to/save/trt/model.engine

and wait for the success of SSD tensorrt engine generation.
The height and weight depend on the actual height/width in your training spec file.

In tritonserver, the config.pbtxt should look like https://github.com/NVIDIA/TensorRT/blob/main/quickstart/deploy_to_triton/config.pbtxt.
Need to inspect the engine to set correct input layer and output layer, etc.

1 Like

Everything seems to be alright but seems like the triton is struggling to find the gpu on jetson (I’m running on a Jetson Xavier AGX)

I’m using

Is there a L4T Triton image fromNGC as such?

ganindu@ubuntu:~$ docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:23.08-py3 tritonserver --model-repository=/models

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.08 (build 66821655)
Triton Server Version 2.37.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35
I0926 11:43:39.987287 1 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch
I0926 11:43:39.987458 1 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15
I0926 11:43:39.987518 1 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15
W0926 11:43:39.987734 1 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0926 11:43:39.987877 1 cuda_memory_manager.cc:117] CUDA memory pool disabled
I0926 11:43:39.997829 1 model_lifecycle.cc:462] loading: nozzlenet-v2:1
I0926 11:43:40.001598 1 tensorrt.cc:65] TRITONBACKEND_Initialize: tensorrt
I0926 11:43:40.001713 1 tensorrt.cc:75] Triton TRITONBACKEND API version: 1.15
I0926 11:43:40.001771 1 tensorrt.cc:81] 'tensorrt' TRITONBACKEND API version: 1.15
I0926 11:43:40.001822 1 tensorrt.cc:105] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0926 11:43:40.002705 1 tensorrt.cc:222] TRITONBACKEND_ModelInitialize: nozzlenet-v2 (version 1)
I0926 11:43:40.007691 1 tensorrt.cc:288] TRITONBACKEND_ModelInstanceInitialize: nozzlenet-v2_0 (CPU device 0)
I0926 11:43:40.008567 1 tensorrt.cc:344] TRITONBACKEND_ModelInstanceFinalize: delete instance state
E0926 11:43:40.008699 1 backend_model.cc:553] ERROR: Failed to create instance: unable to load model 'nozzlenet-v2', TensorRT backend supports only GPU device
I0926 11:43:40.008795 1 tensorrt.cc:265] TRITONBACKEND_ModelFinalize: delete model state
E0926 11:43:40.008879 1 model_lifecycle.cc:622] failed to load 'nozzlenet-v2' version 1: Invalid argument: unable to load model 'nozzlenet-v2', TensorRT backend supports only GPU device
I0926 11:43:40.009057 1 model_lifecycle.cc:757] failed to load 'nozzlenet-v2'
I0926 11:43:40.009466 1 server.cc:604] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0926 11:43:40.009825 1 server.cc:631] 
+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend  | Path                                                      | Config                                                                                                                                                        |
+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pytorch  | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so   | {}                                                                                                                                                            |
| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0926 11:43:40.010046 1 server.cc:674] 
+--------------+---------+---------------------------------------------------------------------------------------------------------------+
| Model        | Version | Status                                                                                                        |
+--------------+---------+---------------------------------------------------------------------------------------------------------------+
| nozzlenet-v2 | 1       | UNAVAILABLE: Invalid argument: unable to load model 'nozzlenet-v2', TensorRT backend supports only GPU device |
+--------------+---------+---------------------------------------------------------------------------------------------------------------+

I0926 11:43:40.011327 1 metrics.cc:703] Collecting CPU metrics
I0926 11:43:40.012150 1 tritonserver.cc:2435] 
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.37.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /models                                                                                                                                                                                                         |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0926 11:43:40.012232 1 server.cc:305] Waiting for in-flight requests to complete.
I0926 11:43:40.012282 1 server.cc:321] Timeout 30: Found 0 model versions that have in-flight inferences
I0926 11:43:40.012341 1 server.cc:336] All models are stopped, unloading models
I0926 11:43:40.012408 1 server.cc:343] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

Note: The Triton server works in the DGX and does not work in the Jetson

It is a topic for running Trition server in Jetson. Please refer to Support for Triton Inference Server on Jetson NX.

Thanks!! (I will mark using trtexec as the solution, however Is there no way to run forward passes as Ihave originally intended to without using tensorrt, I’m asking because I thought with TAO5 we’d get that ability to use TAO models with vanilla onnx or pytorch (from SSDs, maybe import as pytorch models?)

In this case, please refer to Errors while reading ONNX file produced by TAO 5

1 Like

Thanks I’ve been triying out both using a tensort engine and using trition (on DGX as I’ve shown in the replies above)

here I though using run --gpus all was not correct when using the nvidia runtime (I haven’t tested it because I keep gaining traction with the tensorrt method)

I also haven’t tried the method of clipping the NMSDynamic_TRT to get the .onnx file comptible with the ONNX opset but seems like this code can be adopted to trm the file.

I guess in my case I need to have

graph.outputs = [tensors["anchor_data"].to_variable(dtype=np.float32), tensors["loc_data"].to_variable(dtype=np.float32), tensors["conf_data"].to_variable(dtype=np.float32) ]

and implement NMS on my own.

Thanks a lot!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.