I think the above model was actually generated from the repo of tao-toolkit-triton-apps.
As I cloned the tao-toolkit-triton-apps. and modified a bit in tao-toolkit-triton-apps/download_and_convert.sh:
echo "Converting the Electric_bicycle_net_tao model"
mkdir -p /model_repository/electric_bicycle_net_tao/1
tao-converter /tao_models/electric_bicycle_net_tao/final_model.etlt \
-k nvidia_tlt \
-d 3,224,224 \
-o predictions/Softmax \
-m 16 \
-e /model_repository/electric_bicycle_net_tao/1/model.plan
At the starting of the triton-server, I can see the model was correctly converted:
...
...
...
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2565, GPU 1702 (MiB)
[INFO] [MemUsageSnapshot] Builder end: CPU 2546 MiB, GPU 1702 MiB
Converting the Electric_bicycle_net_tao model
[INFO] [MemUsageChange] Init CUDA: CPU +534, GPU +0, now: CPU 540, GPU 560 (MiB)
[INFO] [MemUsageSnapshot] Builder begin: CPU 629 MiB, GPU 560 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +791, GPU +340, now: CPU 1464, GPU 900 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +195, GPU +342, now: CPU 1659, GPU 1242 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 1 output network tensors.
[INFO] Total Host Persistent Memory: 94864
[INFO] Total Device Persistent Memory: 46283264
[INFO] Total Scratch Memory: 0
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 53 MiB, GPU 32 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2634, GPU 1768 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2634, GPU 1776 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2634, GPU 1760 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2634, GPU 1742 (MiB)
[INFO] [MemUsageSnapshot] Builder end: CPU 2634 MiB, GPU 1742 MiB
I0323 11:48:38.459937 64 metrics.cc:298] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3060
I0323 11:48:38.625626 64 libtorch.cc:1092] TRITONBACKEND_Initialize: pytorch
I0323 11:48:38.625643 64 libtorch.cc:1102] Triton TRITONBACKEND API version: 1.6
I0323 11:48:38.625646 64 libtorch.cc:1108] 'pytorch' TRITONBACKEND API version: 1.6
2022-03-23 11:48:38.818313: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0323 11:48:38.844581 64 tensorflow.cc:2170] TRITONBACKEND_Initialize: tensorflow
I0323 11:48:38.844603 64 tensorflow.cc:2180] Triton TRITONBACKEND API version: 1.6
I0323 11:48:38.844606 64 tensorflow.cc:2186] 'tensorflow' TRITONBACKEND API version: 1.6
I0323 11:48:38.844609 64 tensorflow.cc:2210] backend configuration:
{}
I0323 11:48:38.845561 64 onnxruntime.cc:1999] TRITONBACKEND_Initialize: onnxruntime
I0323 11:48:38.845572 64 onnxruntime.cc:2009] Triton TRITONBACKEND API version: 1.6
I0323 11:48:38.845575 64 onnxruntime.cc:2015] 'onnxruntime' TRITONBACKEND API version: 1.6
I0323 11:48:38.876770 64 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I0323 11:48:38.876789 64 openvino.cc:1203] Triton TRITONBACKEND API version: 1.6
I0323 11:48:38.876792 64 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.6
I0323 11:48:39.002462 64 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fd9e6000000' with size 268435456
I0323 11:48:39.002599 64 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0323 11:48:39.003531 64 model_repository_manager.cc:1022] loading: vehicletypenet_tao:1
I0323 11:48:39.103978 64 model_repository_manager.cc:1022] loading: electric_bicycle_net_tao:1
I0323 11:48:39.133371 64 tensorrt.cc:4925] TRITONBACKEND_Initialize: tensorrt
I0323 11:48:39.133394 64 tensorrt.cc:4935] Triton TRITONBACKEND API version: 1.6
I0323 11:48:39.133398 64 tensorrt.cc:4941] 'tensorrt' TRITONBACKEND API version: 1.6
I0323 11:48:39.133477 64 tensorrt.cc:4984] backend configuration:
{}
I0323 11:48:39.133680 64 tensorrt.cc:5036] TRITONBACKEND_ModelInitialize: vehicletypenet_tao (version 1)
I0323 11:48:39.135022 64 tensorrt.cc:5085] TRITONBACKEND_ModelInstanceInitialize: vehicletypenet_tao (GPU device 0)
I0323 11:48:39.509504 64 logging.cc:49] [MemUsageChange] Init CUDA: CPU +525, GPU +0, now: CPU 648, GPU 624 (MiB)
I0323 11:48:39.513867 64 logging.cc:49] Loaded engine size: 5 MB
I0323 11:48:39.513959 64 logging.cc:49] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 659 MiB, GPU 624 MiB
I0323 11:48:40.022610 64 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +792, GPU +340, now: CPU 1451, GPU 970 (MiB)
I0323 11:48:40.441824 64 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +195, GPU +336, now: CPU 1646, GPU 1306 (MiB)
I0323 11:48:40.442715 64 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1646, GPU 1288 (MiB)
I0323 11:48:40.442756 64 logging.cc:49] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1646 MiB, GPU 1288 MiB
I0323 11:48:40.442897 64 logging.cc:49] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1635 MiB, GPU 1288 MiB
I0323 11:48:40.443158 64 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1635, GPU 1298 (MiB)
I0323 11:48:40.443830 64 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1635, GPU 1306 (MiB)
I0323 11:48:40.444192 64 logging.cc:49] [MemUsageSnapshot] ExecutionContext creation end: CPU 1635 MiB, GPU 1326 MiB
I0323 11:48:40.444274 64 tensorrt.cc:1379] Created instance vehicletypenet_tao on GPU 0 with stream priority 0
I0323 11:48:40.444292 64 tensorrt.cc:5036] TRITONBACKEND_ModelInitialize: electric_bicycle_net_tao (version 1)
I0323 11:48:40.444393 64 model_repository_manager.cc:1183] successfully loaded 'vehicletypenet_tao' version 1
I0323 11:48:40.445171 64 tensorrt.cc:5085] TRITONBACKEND_ModelInstanceInitialize: electric_bicycle_net_tao (GPU device 0)
I0323 11:48:40.445384 64 logging.cc:49] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1635, GPU 1326 (MiB)
I0323 11:48:40.475368 64 logging.cc:49] Loaded engine size: 44 MB
I0323 11:48:40.475466 64 logging.cc:49] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 1724 MiB, GPU 1326 MiB
I0323 11:48:40.569055 64 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1724, GPU 1386 (MiB)
I0323 11:48:40.569464 64 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 1724, GPU 1396 (MiB)
I0323 11:48:40.569926 64 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1724, GPU 1380 (MiB)
I0323 11:48:40.569967 64 logging.cc:49] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1724 MiB, GPU 1380 MiB
I0323 11:48:40.572716 64 logging.cc:49] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1635 MiB, GPU 1380 MiB
I0323 11:48:40.572975 64 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 1636, GPU 1388 (MiB)
I0323 11:48:40.573242 64 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1636, GPU 1396 (MiB)
I0323 11:48:40.573912 64 logging.cc:49] [MemUsageSnapshot] ExecutionContext creation end: CPU 1636 MiB, GPU 1516 MiB
I0323 11:48:40.573991 64 tensorrt.cc:1379] Created instance electric_bicycle_net_tao on GPU 0 with stream priority 0
I0323 11:48:40.574091 64 model_repository_manager.cc:1183] successfully loaded 'electric_bicycle_net_tao' version 1
I0323 11:48:40.574142 64 server.cc:522]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0323 11:48:40.574177 64 server.cc:549]
+-------------+-----------------------------------------------------------------+--------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+--------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
| tensorflow | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
| openvino | /opt/tritonserver/backends/openvino/libtriton_openvino.so | {} |
| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {} |
+-------------+-----------------------------------------------------------------+--------+
I0323 11:48:40.574200 64 server.cc:592]
+--------------------------+---------+--------+
| Model | Version | Status |
+--------------------------+---------+--------+
| electric_bicycle_net_tao | 1 | READY |
| vehicletypenet_tao | 1 | READY |
+--------------------------+---------+--------+
...
...
...
and the converted model is the one I used in the python test script.