Failed to load tao models on triton

lprnet | 1 | UNAVAILABLE: Invalid argument: model ‘lprnet_0_gpu0’, tensor ‘tf_op_layer_ArgMax’: the model expects 2 dimensions (shape [-1,30]) but the model configuration specifies 2 dimensions (an initial batch dimension because max_batch_size > 0 followed by the explicit tensor shape, making complete shape [-1,24]) |

I am facing the above issue when trying to load a tao model on triton.

Please share the full log.

I0721 09:19:22.693839 1 server.cc:543]
±------------±----------------------------------------------------------------±-------+
| Backend | Path | Config |
±------------±----------------------------------------------------------------±-------+
| tensorrt | | {} |
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
| tensorflow | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
| openvino | /opt/tritonserver/backends/openvino/libtriton_openvino.so | {} |
±------------±----------------------------------------------------------------±-------+

I0721 09:19:22.693950 1 server.cc:586]
±--------------------±--------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
±--------------------±--------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| lprnet | 1 | UNAVAILABLE: Invalid argument: model ‘lprnet_0_gpu0’, tensor ‘tf_op_layer_ArgMax’: the model expects 2 dimensions (shape [-1,30]) but the model configuration specifies 2 dimensions (an initial batch dimension because max_batch_size > 0 followed by the explicit tensor shape, making complete shape [-1,24]) |

±--------------------±--------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0721 09:19:22.694072 1 tritonserver.cc:1718]
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.12.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0721 09:19:22.694081 1 server.cc:234] Waiting for in-flight requests to complete.

I0721 09:19:22.694501 1 server.cc:249] Timeout 30: Found 6 live models and 0 in-flight non-inference requests
I0721 09:19:22.715420 1 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU -8, now: CPU 660, GPU 2606 (MiB)
I0721 09:19:22.715768 1 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 660, GPU 2598 (MiB)
I0721 09:19:22.723520 1 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU -360, now: CPU 660, GPU 2230 (MiB)
I0721 09:19:22.741913 1 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU -2, now: CPU 660, GPU 1278 (MiB)
I0721 09:19:22.745553 1 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU -8, now: CPU 660, GPU 1262 (MiB)
I0721 09:19:22.751650 1 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU -66, now: CPU 660, GPU 1196 (MiB)

I0721 09:19:23.694623 1 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

Please double check if you follow GitHub - NVIDIA-AI-IOT/tao-toolkit-triton-apps: Sample app code for deploying TAO Toolkit trained models to Triton.
I recall there is not such issue according the the feedback of other users.

I have loaded classification, yolo models on triton.

I am facing this issue for the LPRNET.

Can you try to use the default LPRnet model?
That means, please follow the steps without any change.

Could you please share me the engine file conversion command for LPRNET?

The command from the notebook uses a FP16 and the config.pbtxt uses a FP32

Run with the default github, it will download the model and convert to tensorrt engine.
You need not change anything.

I want to run for my custom model.

You can run default github.
Then replace your etlt model with default one.

Could you share the github link?

See above.

You are asking me to run the models in model_repository?

Firstly, you can run the default github to get familiar with the process.

Then, you can replace your etlt model with default one. And then comment out

tao-toolkit-triton-apps/start_server.sh at main · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub
tao-toolkit-triton-apps/start_server.sh at main · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub

to let it not download the default lpr model next time when you trigger server.

I was able to load the default model. But when tried to use my custom model this was the error

I0721 10:33:55.982376 69 server.cc:592]
±-----------------------------±--------±--------------------------------------------------------------------------------+
| Model | Version | Status |
±-----------------------------±--------±--------------------------------------------------------------------------------+
| dashcamnet_tao | 1 | READY |
| lprnet_tao | 1 | UNAVAILABLE: Invalid argument: model ‘lprnet_tao’, tensor ‘tf_op_layer_ArgMax’: |
| | | the model expects 2 dimensions (shape [-1,30]) but the model configuration spe |
| | | cifies 2 dimensions (an initial batch dimension because max_batch_size > 0 foll |
| | | owed by the explicit tensor shape, making complete shape [-1,24]) |
| multitask_classification_tao | 1 | READY |
| peoplenet_tao | 1 | READY |
| peoplesegnet_tao | 1 | READY |
| retinanet_tao | 1 | READY |
| vehicletypenet_tao | 1 | READY |
| yolov3_tao | 1 | READY |
±-----------------------------±--------±--------------------------------------------------------------------------------+

This is my config.pbtxt file

name: “lprnet_tao”
platform: “tensorrt_plan”
max_batch_size: 16
input [
{
name: “image_input”
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 48, 120 ]
}
]
output [
{
name: “tf_op_layer_ArgMax”
data_type: TYPE_INT32
dims: [ 24 ]
},
{
name: “tf_op_layer_Max”
data_type: TYPE_FP32
dims: [ 24 ]
}
]
dynamic_batching { }

when I try to add -1 to the dimensions then this is the error message.

E0721 10:39:16.954900 70 model_repository_manager.cc:1890] Poll failed for model directory ‘lprnet_tao’: model input NHWC/NCHW require 3 dims for lprnet_tao

@Morganh, How do I fix this error??
What is causing this issue?

Did you ever run inference successfully with your lprnet model via below ways?

  1. $ tao lprnet inference xxx
  2. Use official inference way. See GitHub - NVIDIA-AI-IOT/deepstream_lpr_app: Sample app code for LPR deployment on DeepStream](GitHub - NVIDIA-AI-IOT/deepstream_lpr_app: Sample app code for LPR deployment on DeepStream)

More, please modify

https://github.com/NVIDIA-AI-IOT/tao-toolkit-triton-apps/blob/main/scripts/download_and_convert.sh#L44

since your model’s shape is different.