Can't load trt engine and throwing an instance of 'nvinfer1::MyelinError'

Hi, I am deploying LPR model downloaded from NGC in python. When I load the engine
trt_engine = load_engine(trt_runtime, trt_engine_path)

it says:
[TensorRT] ERROR: myelin/myelinGraphContext.h (26) - Myelin Error in MyelinGraphContext: 66 (myelinBinaryVersionMismatch : myelinGraphDeserializeBinary called with a buffer that’s not a Myelin binary (invalid version)
)
terminate called after throwing an instance of ‘nvinfer1::MyelinError’
what(): std::exception

My .trt file is downloaded from NGC(wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_lprnet/versions/deployable_v1.0/files/us_lprnet_baseline18_deployable.etlt).
And I converter it using tlt
tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 ./us_lprnet_baseline18_deployable.etlt -t int8 -e ./lpr_us_onnx_int8.trt -w 700000000

I run everything in the container.

You mention that you run everything in the container.
Which container did you run?

The .etlt file is converted in the tlt container nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3
I deploy the .trt file in the tensorRT20.10 container nvcr.io/nvidia/tensorrt:20.10-py3

Please generate .trt file directly inside the tensorRT20.10 container.
Firstly, copy .etlt file to that container and then download the tlt-converter according to Overview — TAO Toolkit 3.22.05 documentation

I accept your advice and converter the .etlt file in the tensorRT20.10 container by
./tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 ./us_lprnet_baseline18_deployable.etlt -t int8 -e ./lpr_us_onnx_int8.trt

However, I found the model size is wrong when deploying the .trt. The error information is:
Traceback (most recent call last):
File “trt_old.py”, line 243, in
inputs, outputs, bindings, stream = allocate_buffers(trt_engine)
File “trt_old.py”, line 66, in allocate_buffers
host_mem = cuda.pagelocked_empty(size, dtype)
pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory

I found the engine.binding.shape is (-1, 3, 48, 96). The batch size can’t be -1 and I don’t know why.
Others also come across the same question as the post says:

So, can I say that you meet the same issue as Python run LPRNet with TensorRT ? What is your inference code?

Yes, the same issue.
My inference code is the same as Python run LPRNet with TensorRT. And I test the code with the LPD model (downloaded from the NGC and convertered using tlt-converter in the way as the processing of LPR model) to make sure the inference code works well.

Can you reproduce my issue? Is it the problem of the LPR model or the tlt-converter tool?

For LPR model, it is not based on TLT detectnet_v2. So, it is different from LPD model. See Overview — TAO Toolkit 3.22.05 documentation

So, for the trt engine you have generated, please use official inference command to check firstly.
See LPRNet — Transfer Learning Toolkit 3.0 documentation

I do inference in the tlt3.0 container and show the same error:
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2021-04-30 06:55:59,865 [INFO] /usr/local/lib/python3.6/dist-packages/iva/lprnet/utils/spec_loader.pyc: Merging specification from /workspace/specs/lpr_spec.txt
[TensorRT] ERROR: myelin/myelinGraphContext.h (26) - Myelin Error in MyelinGraphContext: 66 (myelinBinaryVersionMismatch : myelinGraphDeserializeBinary called with a buffer that’s not a Myelin binary (invalid version)
)
terminate called after throwing an instance of ‘nvinfer1::MyelinError’
what(): std::exception
Aborted (core dumped)
Traceback (most recent call last):
File “/usr/local/bin/lprnet”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/entrypoint/lprnet.py”, line 12, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 296, in launch_job
AssertionError: Process run failed.

I also find the batchsize is -1 in the convertering process.
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[INFO] Detected 1 inputs and 2 output network tensors.

I am afraid that there is something wrong with the us_lprnet_baseline18_deployable.etlt downloaded from the NGC.
wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_lprnet/versions/deployable_v1.0/files/us_lprnet_baseline18_deployable.etlt

I do not think so. The etlt file should be fine. Some users can deploy it and run inference successfully. See

Can you share the full command you run inference?

The model converter :

root@99e6798fbdc8:/workspace/lpr# tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 ./us_lprnet_baseline18_deployable.etlt -t int8 -e ./lpr_us_onnx_int8_old.trt
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[INFO] Detected 1 inputs and 2 output network tensors.

The model inference:

root@99e6798fbdc8:/workspace/lpr# lprnet inference --gpu_index=0 -m lpr_us_onnx_int8_new.trt -i car1.jpg -e /workspace/specs/lpr_spec.txt --trt
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2021-04-30 08:05:19,440 [INFO] /usr/local/lib/python3.6/dist-packages/iva/lprnet/utils/spec_loader.pyc: Merging specification from /workspace/specs/lpr_spec.txt
[TensorRT] ERROR: myelin/myelinGraphContext.h (26) - Myelin Error in MyelinGraphContext: 66 (myelinBinaryVersionMismatch : myelinGraphDeserializeBinary called with a buffer that's not a Myelin binary (invalid version)
)
terminate called after throwing an instance of 'nvinfer1::MyelinError'
  what():  std::exception
Aborted (core dumped)
Traceback (most recent call last):
  File "/usr/local/bin/lprnet", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/entrypoint/lprnet.py", line 12, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py", line 296, in launch_job
AssertionError: Process run failed.

I run all in the tlt3.0 container.

Please download below version of tlt-converter and use it to generate the trt engine again.

wget https://developer.nvidia.com/cuda111-cudnn80-trt72
unzip cuda111-cudnn80-trt72
chmod +x tlt-converter

See Overview - NVIDIA Docs

I try again following your advice but failed again. It shows the same error information.

I cannot reproduce the error. My steps are as below. Can you double check again?

Step:
$ tlt lprnet run /bin/bash

root@32b0be3ea045:/workspace/demo_2.0/lprnet# wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_lprnet/versions/deployable_v1.0/files/us_lprnet_baseline18_deployable.etlt

root@32b0be3ea045:/workspace/demo_2.0/lprnet# tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 ./us_lprnet_baseline18_deployable.etlt -t int8 -e ./lpr_us_onnx_int8.trt
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[WARNING] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[INFO] Detected 1 inputs and 2 output network tensors.

root@32b0be3ea045:/workspace/demo_2.0/lprnet# lprnet inference -m lpr_us_onnx_int8.trt -i /workspace/demo_2.0/lprnet/data/openalpr/train/image -e /workspace/examples/lprnet/specs/tutorial_spec.txt --trt
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2021-05-06 08:51:54,636 [INFO] /usr/local/lib/python3.6/dist-packages/iva/lprnet/utils/spec_loader.pyc: Merging specification from /workspace/examples/lprnet/specs/tutorial_spec.txt
Using TRT engine for inference, setting batch size to the one in eval_config: 1
/workspace/demo_2.0/lprnet/data/openalpr/train/image/wts-lg-000178.jpg:6LSU216
/workspace/demo_2.0/lprnet/data/openalpr/train/image/car9-1.jpg:ASC7399
/workspace/demo_2.0/lprnet/data/openalpr/train/image/wts-lg-000189.jpg:FK4W3L
/workspace/demo_2.0/lprnet/data/openalpr/train/image/wts-lg-000171.jpg:DCK6344

I tried your steps and succeeded. I think the key is tlt-converter. I try to converter using the tlt-converter tool in the tlt3.0 container and succeed in the inference process.
But I also try to converter using the cuda111-cudnn80-trt72 tool and failed.

Thanks for the info. I will close this topic.