Please provide the following information when requesting support.
• Hardware Xavier / DGX
• Network Type SSD
• TLT Version Model generated in TAO 5.0.0 (attempted run AGX Xavier (and DGX)/ Docker nvcr.io/nvidia/tritonserver:23.08-py3)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
- export the model as shown in the example notebook (I used a different dataset but the model .onnx file was generated sucessfully )
- Set up the model to be set up for the Inference server.
- run the command
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:23.08-py3 tritonserver --model-repository=/models
I used a jetson
log
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 23.08 (build 66821655)
Triton Server Version 2.37.0
Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35
I0925 16:31:17.683765 1 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch
I0925 16:31:17.683949 1 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15
I0925 16:31:17.684002 1 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15
W0925 16:31:17.684197 1 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0925 16:31:17.684262 1 cuda_memory_manager.cc:117] CUDA memory pool disabled
I0925 16:31:17.687395 1 model_lifecycle.cc:462] loading: nozzlenet_onnx:1
I0925 16:31:17.695515 1 onnxruntime.cc:2514] TRITONBACKEND_Initialize: onnxruntime
I0925 16:31:17.695649 1 onnxruntime.cc:2524] Triton TRITONBACKEND API version: 1.15
I0925 16:31:17.695701 1 onnxruntime.cc:2530] 'onnxruntime' TRITONBACKEND API version: 1.15
I0925 16:31:17.695749 1 onnxruntime.cc:2560] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0925 16:31:17.840204 1 onnxruntime.cc:2625] TRITONBACKEND_ModelInitialize: nozzlenet_onnx (version 1)
I0925 16:31:18.070677 1 onnxruntime.cc:2666] TRITONBACKEND_ModelFinalize: delete model state
E0925 16:31:18.070914 1 model_lifecycle.cc:622] failed to load 'nozzlenet_onnx' version 1: Internal: onnx runtime error 10: Load model from /models/nozzlenet_onnx/1/model.onnx failed:This is an invalid model. In Node, ("NMS", NMSDynamic_TRT, "", -1) : ("anchor_data": tensor(float),"loc_data": tensor(float),"conf_data": tensor(float),) -> ("NMS": tensor(float),"NMS_1": tensor(float),) , Error No Op registered for NMSDynamic_TRT with domain_version of 15
I0925 16:31:18.071039 1 model_lifecycle.cc:757] failed to load 'nozzlenet_onnx'
I0925 16:31:18.071599 1 server.cc:604]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0925 16:31:18.072023 1 server.cc:631]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0925 16:31:18.072340 1 server.cc:674]
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| nozzlenet_onnx | 1 | UNAVAILABLE: Internal: onnx runtime error 10: Load model from /models/nozzlenet_onnx/1/model.onnx failed:This is an invalid model. In Node, ("NMS", NMSDynamic_TRT, "", -1) : ("anchor_data": tensor(float),"loc_data": tensor(float),"conf_data": tensor(float),) -> ("NMS": tensor(float),"NMS_1": tensor(float),) , Error No Op registered for NMSDynamic_TRT with domain_version of 15 |
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0925 16:31:18.073954 1 metrics.cc:703] Collecting CPU metrics
I0925 16:31:18.074877 1 tritonserver.cc:2435]
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.37.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0925 16:31:18.075001 1 server.cc:305] Waiting for in-flight requests to complete.
I0925 16:31:18.075051 1 server.cc:321] Timeout 30: Found 0 model versions that have in-flight inferences
I0925 16:31:18.075101 1 server.cc:336] All models are stopped, unloading models
I0925 16:31:18.075144 1 server.cc:343] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
then I used the dgx (istill got the same results)
g@nvdgx:~/Workspace/Triton$ docker run --rm -p 9100:8000 -p 9101:8001 -p 9102:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:23.08-py3 tritonserver --model-repository=/models
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 23.08 (build 66820947)
Triton Server Version 2.37.0
Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 12.2 driver version 535.86.10 with kernel driver version 470.161.03.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
I0925 17:34:37.150977 1 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch
I0925 17:34:37.151027 1 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15
I0925 17:34:37.151033 1 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15
I0925 17:34:37.311806 1 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7f6724000000' with size 268435456
I0925 17:34:37.316781 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0925 17:34:37.316789 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
I0925 17:34:37.316792 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 2 with size 67108864
I0925 17:34:37.316794 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 3 with size 67108864
I0925 17:34:37.646124 1 model_lifecycle.cc:462] loading: nozzlenet_onnx:1
I0925 17:34:37.647384 1 onnxruntime.cc:2514] TRITONBACKEND_Initialize: onnxruntime
I0925 17:34:37.647402 1 onnxruntime.cc:2524] Triton TRITONBACKEND API version: 1.15
I0925 17:34:37.647407 1 onnxruntime.cc:2530] 'onnxruntime' TRITONBACKEND API version: 1.15
I0925 17:34:37.647410 1 onnxruntime.cc:2560] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0925 17:34:37.669830 1 onnxruntime.cc:2625] TRITONBACKEND_ModelInitialize: nozzlenet_onnx (version 1)
I0925 17:34:37.743687 1 onnxruntime.cc:2666] TRITONBACKEND_ModelFinalize: delete model state
E0925 17:34:37.743723 1 model_lifecycle.cc:622] failed to load 'nozzlenet_onnx' version 1: Internal: onnx runtime error 10: Load model from /models/nozzlenet_onnx/1/model.onnx failed:This is an invalid model. In Node, ("NMS", NMSDynamic_TRT, "", -1) : ("anchor_data": tensor(float),"loc_data": tensor(float),"conf_data": tensor(float),) -> ("NMS": tensor(float),"NMS_1": tensor(float),) , Error No Op registered for NMSDynamic_TRT with domain_version of 15
I0925 17:34:37.743735 1 model_lifecycle.cc:757] failed to load 'nozzlenet_onnx'
I0925 17:34:37.743794 1 server.cc:604]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0925 17:34:37.743853 1 server.cc:631]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0925 17:34:37.743895 1 server.cc:674]
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| nozzlenet_onnx | 1 | UNAVAILABLE: Internal: onnx runtime error 10: Load model from /models/nozzlenet_onnx/1/model.onnx failed:This is an invalid model. In Node, ("NMS", NMSDynamic_TRT, "", -1) : ("anchor_data": tensor(float),"loc_data": tensor(float),"conf_data": tensor(float),) -> ("NMS": tensor(float),"NMS_1": tensor(float),) , Error No Op registered for NMSDynamic_TRT with domain_version of 15 |
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0925 17:34:37.843362 1 metrics.cc:810] Collecting metrics for GPU 0: NVIDIA A100-SXM4-40GB
I0925 17:34:37.843394 1 metrics.cc:810] Collecting metrics for GPU 1: NVIDIA A100-SXM4-40GB
I0925 17:34:37.843400 1 metrics.cc:810] Collecting metrics for GPU 2: NVIDIA A100-SXM4-40GB
I0925 17:34:37.843405 1 metrics.cc:810] Collecting metrics for GPU 3: NVIDIA A100-SXM4-40GB
I0925 17:34:37.845158 1 metrics.cc:703] Collecting CPU metrics
I0925 17:34:37.845356 1 tritonserver.cc:2435]
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.37.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| cuda_memory_pool_byte_size{1} | 67108864 |
| cuda_memory_pool_byte_size{2} | 67108864 |
| cuda_memory_pool_byte_size{3} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0925 17:34:37.845365 1 server.cc:305] Waiting for in-flight requests to complete.
I0925 17:34:37.845373 1 server.cc:321] Timeout 30: Found 0 model versions that have in-flight inferences
I0925 17:34:37.845377 1 server.cc:336] All models are stopped, unloading models
I0925 17:34:37.845380 1 server.cc:343] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
And then I tried with vanilla python with onnx installed
import onnx
ssd_model = onnx.load("model.onnx")
ort_session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])
I got error output
InvalidGraph Traceback (most recent call last)
/home/nvidia/Workspace/nozzlenet_pytorch.ipynb Cell 4 line 1
----> 1 ort_session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])
File /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:419, in InferenceSession.__init__(self, path_or_bytes, sess_options, providers, provider_options, **kwargs)
416 disabled_optimizers = kwargs["disabled_optimizers"] if "disabled_optimizers" in kwargs else None
418 try:
--> 419 self._create_inference_session(providers, provider_options, disabled_optimizers)
420 except (ValueError, RuntimeError) as e:
421 if self._enable_fallback:
File /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:460, in InferenceSession._create_inference_session(self, providers, provider_options, disabled_optimizers)
458 session_options = self._sess_options if self._sess_options else C.get_default_session_options()
459 if self._model_path:
--> 460 sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
461 else:
462 sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from model.onnx failed:This is an invalid model. In Node, ("NMS", NMSDynamic_TRT, "", -1) : ("anchor_data": tensor(float),"loc_data": tensor(float),"conf_data": tensor(float),) -> ("NMS": tensor(float),"NMS_1": tensor(float),) , Error No Op registered for NMSDynamic_TRT with domain_version of 15
It looks like in all case the commmon culprit is the NMSDynamic_TRT
plugin
How do I install NMSDynamic_TRT
plugin for the onnx runtime?
- for Jetson
- for DGX?
Also I feel like this migtht be a problem in creating .trt engines (on jetson starting from a .onnx file, trt engine gets made in the TAO toolkit in DGX with no problems ) as well.
My aims
-
to be able to run forward passes of the model within a notebook to get boundung boxes (this then I will convert into a serverless function that will feed back to our labelling workflows (I am open to use triton for inferencing and the serverless function only as a message broker as I have to use the serverless function to be able to plug in the automatic annotator)
-
generate engine files for deepstream ( I think tesnsorrt in jetson will be able to parse the .onnx file with its inbuiilt plugins (maybe with nvidia OSS and custom parsing lib seperately installed as it was used to be for 4.x jetpacks? (im using ds 6.2)? with jp 5.1/ds 6.2 for this)