Having issues converting LPRnet model with tao-converter to engine file for deployment in Deepstream (Jetson platform)

Hello, I used the tao toolkit to train a LPRnet model, after which I’m attempting to deploy it in Deepstream. I’m aware that lprnet etlt cannot be deployed directly and needs to be converted to TensorRT engine first. However, the tao-converter is failing to parse and build the engine. I believe the model is functional since tao lprnet inference in tao-toolkit produced correct inference results. I’d appreciate any help in converting this model.

• Hardware: Jetson Nano
• Network Type: LPRnet
• TLT Version 8.2 / CUDA Version 10.2

• Training spec:
random_seed: 42
lpr_config {
hidden_units: 512
max_label_length: 8
arch: “baseline”
nlayers: 10
training_config {
batch_size_per_gpu: 32
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 1e-6
max_learning_rate: 1e-4
soft_start: 0.001
annealing: 0.7
regularizer {
type: L2
weight: 5e-4

Command I used to export to etlt:

tao lprnet export -m lprnet_epoch-120.tlt
-k nvidia_tlt
-e lprnet_train.txt
–data_type fp16

Command I’m using run tao-converter:
./tao-converter lprnet.etlt -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96

Output of tao-converter:

[INFO] [MemUsageChange] Init CUDA: CPU +230, GPU +0, now: CPU 248, GPU 3226 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 248 MiB, GPU 3226 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 277 MiB, GPU 3254 MiB
[INFO] ----------------------------------------------------------------
[INFO] Input filename: /tmp/file6sZVSV
[INFO] ONNX IR version: 0.0.8
[INFO] Opset version: 15
[INFO] Producer name: keras2onnx
[INFO] Producer version: 1.12.2
[INFO] Domain: onnxmltools
[INFO] Model version: 0
[INFO] Doc string:
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[ERROR] ModelImporter.cpp:773: While parsing node number 29 [Reshape → “flatten_feature/Reshape:0”]:
[ERROR] ModelImporter.cpp:774: — Begin node —
[ERROR] ModelImporter.cpp:775: input: “permute_feature/transpose:0”
input: “shape_tensor2”
output: “flatten_feature/Reshape:0”
name: “flatten_feature”
op_type: “Reshape”
domain: “”

[ERROR] ModelImporter.cpp:776: — End node —
[ERROR] ModelImporter.cpp:779: ERROR: ModelImporter.cpp:162 In function parseGraph:
[6] Invalid Node - flatten_feature
Attribute not found: allowzero
Invalid Node - flatten_feature
Attribute not found: allowzero
[ERROR] Failed to parse the model, please check the encoding key to make sure it’s correct
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[ERROR] 4: [network.cpp::validate::2633] Error Code 4: Internal Error (Network must have at least one output)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

May I know which version of TAO you are using?
$ tao info --verbose

Could you export a new etlt by adding below?
--target_opset 12

Hello. This fixed the issue for me. But I am having another issue pertaining to the deepstream docker images. I am building the engine for use on a jetson orin. It builds successfully on the orin but when trying to deploy it in deepstream as a docker image, there seems to be a conflict of libraries. Here are some logs I have attached in the file:

docker_error.txt (3.5 KB)

What I have tried so far is using different version of the tao-converter to build the model as well as testing different opsets. Nothing has solved the issue for me yet. I’d appreciate any help, thank you.

From the log,

ERROR: [TRT]: 6: The engine plan file is not compatible with this version of TensorRT, expecting library version got, please rebuild.
ERROR: [TRT]: 4: [runtime.cpp::deserializeCudaEngine::49] Error Code 4: Internal Error (Engine deserialization failed.)

It is due to different TRT version between tensorrt engine building and tensort engine inference.

If you want to run inference inside deepstream docker, please build the engine inside it.

I have tried building within the docker container with different tao-converter versions and different opsets. They all give this error when building:

/root/gpgpu/MachineLearning/myelin/src/compiler/optimizer/cublas_impl.cpp:480: void add_heuristic_results_to_tactics(std::vector<cublasLtMatmulHeuristicResult_t>&, std::vectormyelin::ir::tactic_attribute_t&, myelin::ir::tactic_attribute_t&, bool): Assertion `false && “Invalid size written”’ failed.
Aborted (core dumped)

To debug, please continue to use your docker container, and download lpr model via


then, run tao-converter via


More, if you run inference on a Jetson Orin, I suggest you to docker pull lt4 version of deepstream docker in Jetson Orin. Then build tensorrt engine and run inference inside this lt4 deepstream docker.

Hi, I managed to resolve the issue. Sharing the solution here for others:

It seems this is an issue people have been experiencing with the Orin.

Locate to the directory of libcublas.so, e.g. /usr/local/cuda/lib64 for most cases, and create the symbol links

sudo ln -s libcublas.so.11 libcublas.so 
sudo ln -s libcublasLt.so.11 libcublasLt.so

Then, ldconfig, and my error was resolved. Model built successfully within the docker container and was deployed.

Thanks for the update&sharing.