Hi, I am deploying LPR model downloaded from NGC in python. When I load the engine
trt_engine = load_engine(trt_runtime, trt_engine_path)
it says:
[TensorRT] ERROR: myelin/myelinGraphContext.h (26) - Myelin Error in MyelinGraphContext: 66 (myelinBinaryVersionMismatch : myelinGraphDeserializeBinary called with a buffer that’s not a Myelin binary (invalid version)
)
terminate called after throwing an instance of ‘nvinfer1::MyelinError’
what(): std::exception
Please generate .trt file directly inside the tensorRT20.10 container.
Firstly, copy .etlt file to that container and then download the tlt-converter according to Overview — TAO Toolkit 3.22.05 documentation
I accept your advice and converter the .etlt file in the tensorRT20.10 container by
./tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 ./us_lprnet_baseline18_deployable.etlt -t int8 -e ./lpr_us_onnx_int8.trt
However, I found the model size is wrong when deploying the .trt. The error information is:
Traceback (most recent call last):
File “trt_old.py”, line 243, in
inputs, outputs, bindings, stream = allocate_buffers(trt_engine)
File “trt_old.py”, line 66, in allocate_buffers
host_mem = cuda.pagelocked_empty(size, dtype)
pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory
I found the engine.binding.shape is (-1, 3, 48, 96). The batch size can’t be -1 and I don’t know why.
Others also come across the same question as the post says:
Yes, the same issue.
My inference code is the same as Python run LPRNet with TensorRT. And I test the code with the LPD model (downloaded from the NGC and convertered using tlt-converter in the way as the processing of LPR model) to make sure the inference code works well.
I do inference in the tlt3.0 container and show the same error:
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2021-04-30 06:55:59,865 [INFO] /usr/local/lib/python3.6/dist-packages/iva/lprnet/utils/spec_loader.pyc: Merging specification from /workspace/specs/lpr_spec.txt
[TensorRT] ERROR: myelin/myelinGraphContext.h (26) - Myelin Error in MyelinGraphContext: 66 (myelinBinaryVersionMismatch : myelinGraphDeserializeBinary called with a buffer that’s not a Myelin binary (invalid version)
)
terminate called after throwing an instance of ‘nvinfer1::MyelinError’
what(): std::exception
Aborted (core dumped)
Traceback (most recent call last):
File “/usr/local/bin/lprnet”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/entrypoint/lprnet.py”, line 12, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 296, in launch_job
AssertionError: Process run failed.
I also find the batchsize is -1 in the convertering process.
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[INFO] Detected 1 inputs and 2 output network tensors.
root@99e6798fbdc8:/workspace/lpr# tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 ./us_lprnet_baseline18_deployable.etlt -t int8 -e ./lpr_us_onnx_int8_old.trt
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[INFO] Detected 1 inputs and 2 output network tensors.
The model inference:
root@99e6798fbdc8:/workspace/lpr# lprnet inference --gpu_index=0 -m lpr_us_onnx_int8_new.trt -i car1.jpg -e /workspace/specs/lpr_spec.txt --trt
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2021-04-30 08:05:19,440 [INFO] /usr/local/lib/python3.6/dist-packages/iva/lprnet/utils/spec_loader.pyc: Merging specification from /workspace/specs/lpr_spec.txt
[TensorRT] ERROR: myelin/myelinGraphContext.h (26) - Myelin Error in MyelinGraphContext: 66 (myelinBinaryVersionMismatch : myelinGraphDeserializeBinary called with a buffer that's not a Myelin binary (invalid version)
)
terminate called after throwing an instance of 'nvinfer1::MyelinError'
what(): std::exception
Aborted (core dumped)
Traceback (most recent call last):
File "/usr/local/bin/lprnet", line 8, in <module>
sys.exit(main())
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/entrypoint/lprnet.py", line 12, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py", line 296, in launch_job
AssertionError: Process run failed.
root@32b0be3ea045:/workspace/demo_2.0/lprnet# tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 ./us_lprnet_baseline18_deployable.etlt -t int8 -e ./lpr_us_onnx_int8.trt
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[WARNING] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[INFO] Detected 1 inputs and 2 output network tensors.
root@32b0be3ea045:/workspace/demo_2.0/lprnet# lprnet inference -m lpr_us_onnx_int8.trt -i /workspace/demo_2.0/lprnet/data/openalpr/train/image -e /workspace/examples/lprnet/specs/tutorial_spec.txt --trt
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2021-05-06 08:51:54,636 [INFO] /usr/local/lib/python3.6/dist-packages/iva/lprnet/utils/spec_loader.pyc: Merging specification from /workspace/examples/lprnet/specs/tutorial_spec.txt
Using TRT engine for inference, setting batch size to the one in eval_config: 1
/workspace/demo_2.0/lprnet/data/openalpr/train/image/wts-lg-000178.jpg:6LSU216
/workspace/demo_2.0/lprnet/data/openalpr/train/image/car9-1.jpg:ASC7399
/workspace/demo_2.0/lprnet/data/openalpr/train/image/wts-lg-000189.jpg:FK4W3L
/workspace/demo_2.0/lprnet/data/openalpr/train/image/wts-lg-000171.jpg:DCK6344
I tried your steps and succeeded. I think the key is tlt-converter. I try to converter using the tlt-converter tool in the tlt3.0 container and succeed in the inference process.
But I also try to converter using the cuda111-cudnn80-trt72 tool and failed.