LPRNet can't use exported engine file

s.levsov · December 9, 2021, 6:53am

• Hardware RTX 3060
• Network Type LPRNet
• TAO toolkit_version: 3.21.11
• Training spec file
my_spec.txt (1.2 KB)

I followed lprnet notebook in cv_samples_v1.3.0 to train lprnet using my own model. Everything worked well, exported model to trt engine file. Evaluation of engine file works, but if I try to load model in my own code:

    def load_engine(trt_runtime, engine_path):
        trt.init_libnvinfer_plugins(None, "")
        with open(engine_path, 'rb') as f:
            engine_data = f.read()
        engine = trt_runtime.deserialize_cuda_engine(engine_data)
        return engine

I get an error:

[12/09/2021-08:14:00] [TRT] [E] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43)
[12/09/2021-08:14:00] [TRT] [E] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)

From googling I managed to find out that it is due to TensorRT version mismatch.
On my machine:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0

$ dpkg -l | grep TensorRT
ii  libnvinfer-bin                                              8.0.1-1+cuda11.3                      amd64        TensorRT binaries
ii  libnvinfer-dev                                              8.0.1-1+cuda11.3                      amd64        TensorRT development libraries and headers
ii  libnvinfer-doc                                              8.0.1-1+cuda11.3                      all          TensorRT documentation
ii  libnvinfer-plugin-dev                                       8.0.1-1+cuda11.3                      amd64        TensorRT plugin libraries
ii  libnvinfer-plugin8                                          8.0.1-1+cuda11.3                      amd64        TensorRT plugin libraries
ii  libnvinfer-samples                                          8.0.1-1+cuda11.3                      all          TensorRT samples
ii  libnvinfer8                                                 8.0.1-1+cuda11.3                      amd64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                                        8.0.1-1+cuda11.3                      amd64        TensorRT ONNX libraries
ii  libnvonnxparsers8                                           8.0.1-1+cuda11.3                      amd64        TensorRT ONNX libraries
ii  libnvparsers-dev                                            8.0.1-1+cuda11.3                      amd64        TensorRT parsers libraries
ii  libnvparsers8                                               8.0.1-1+cuda11.3                      amd64        TensorRT parsers libraries

Morganh · December 9, 2021, 10:11am

Yes, if you want to deploy the lpr tensorrt engine in one device, please make sure the tensorrt version matches.

You can use corresponding tao-converter to generate lpr tensorrt engine again based on your etlt model.

https://docs.nvidia.com/tao/tao-toolkit/text/tensorrt.html#installing-the-tao-converter

s.levsov · December 9, 2021, 10:22am

Tried using tao-converter-x86-tensorrt8.0 but it gives me another error:

[ERROR] 1: [codeGenerator.cpp::compileGraph::476] Error Code 1: Myelin (No results returned from cublas heuristic search)

Morganh · December 9, 2021, 10:23am

Which device do you want to run your tensorrt engine? RTX 3060 ?

s.levsov · December 9, 2021, 10:27am

RTX 3060, RTX A200 and Jetson Xavier NX (Jetpack 4.4)

Morganh · December 9, 2021, 10:31am

Which device meet above error "[ERROR] 1: [codeGenerator.cpp::compileGraph::476] Error Code 1: Myelin " ?

s.levsov · December 9, 2021, 10:36am

It was RTX 3060

Morganh · December 9, 2021, 10:40am

Could you share the full command and full log?

s.levsov · December 9, 2021, 10:44am

Sure:

./tao-converter lprnet.etlt -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 -e lprnet.engine

[INFO] [MemUsageChange] Init CUDA: CPU +534, GPU +0, now: CPU 540, GPU 574 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/fileSngjLn
[INFO] ONNX IR version:  0.0.7
[INFO] Opset version:    13
[INFO] Producer name:    keras2onnx
[INFO] Producer version: 1.8.1
[INFO] Domain:           onnxmltools
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] ShapedWeights.cpp:173: Weights td_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[INFO] [MemUsageSnapshot] Builder begin: CPU 595 MiB, GPU 574 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +806, GPU +350, now: CPU 1401, GPU 924 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +125, GPU +58, now: CPU 1526, GPU 982 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2694, GPU 1503 (MiB)
[ERROR] 1: [codeGenerator.cpp::compileGraph::476] Error Code 1: Myelin (No results returned from cublas heuristic search)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

Morganh · December 9, 2021, 10:48am

To narrow down, could you download official sample etlt model and retry?
wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/lprnet/versions/deployable_v1.0/files/us_lprnet_baseline18_deployable.etlt

./tao-converter us_lprnet_baseline18_deployable.etlt -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96  -t fp16 -e lprnet.engine

s.levsov · December 9, 2021, 10:50am

Same error:

[INFO] [MemUsageChange] Init CUDA: CPU +534, GPU +0, now: CPU 540, GPU 590 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/fileG7Jijo
[INFO] ONNX IR version:  0.0.7
[INFO] Opset version:    12
[INFO] Producer name:    keras2onnx
[INFO] Producer version: 1.7.0
[INFO] Domain:           onnxmltools
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] ShapedWeights.cpp:173: Weights td_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[INFO] [MemUsageSnapshot] Builder begin: CPU 595 MiB, GPU 590 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +806, GPU +350, now: CPU 1401, GPU 940 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +125, GPU +58, now: CPU 1526, GPU 998 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2695, GPU 1518 (MiB)
[ERROR] 1: [codeGenerator.cpp::compileGraph::476] Error Code 1: Myelin (No results returned from cublas heuristic search)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

Morganh · December 9, 2021, 10:51am

May I know you run above command inside tao docker or outside tao docker?

s.levsov · December 9, 2021, 10:52am

Outside tao docker.

s.levsov · December 9, 2021, 10:55am

If I run converter inside docker, conversion runs fine, but engine isn’t usable outside of it.

Morganh · December 9, 2021, 11:06am

OK, since it is failed in generating tensorrt engine with official sample etlt model, there should be something mismatching in your RTX3060.
Please double check the CUDA/Cudnn/TensorRT installation.

More, you can try to run in Xavier NX firstly. Please make sure use the corresponding version of tao-converter.

s.levsov · December 9, 2021, 11:07am

Will try and let you know!
Thanks

s.levsov · December 9, 2021, 12:16pm

I have tried different approach.
Installed tao via pip (not in venv). When I try to convert model, I get new error.

2021-12-09 13:52:13,742 [INFO] root: Registry: ['nvcr.io']
2021-12-09 13:52:13,774 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
Error: no input dimensions given
2021-12-09 13:52:15,960 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · December 9, 2021, 12:36pm

In Xavier NX, just copy the etlt model in it, and download correct version of tao-converter, then generate trt engine. It is not needed to install tao.

system · December 28, 2021, 5:50am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to load in TensorRT model exported by TAO converter TAO Toolkit	3	497	April 13, 2022
Having issues converting LPRnet model with tao-converter to engine file for deployment in Deepstream (Jetson platform) TAO Toolkit	10	942	March 28, 2023
Can't load trt engine and throwing an instance of 'nvinfer1::MyelinError' TAO Toolkit	17	2724	October 12, 2021
Issue while converting maskrcnn model to trt from etlt on Laptops TAO Toolkit tensorrt , tao	23	1401	June 10, 2022
GazeNet - Tao_converter [ERROR] input_left_images:0: number of dimensions is 4 but profile 0 has 3 TAO Toolkit	5	334	July 12, 2023
Unet tao-convert error: Network must have at least one output TAO Toolkit	8	822	September 7, 2022
Exporting model to onnx using "tao model segformer export" TAO Toolkit	5	491	September 6, 2023
Tao-converter doesn't convert ".etlt" to ".engine" TAO Toolkit debugging-and-troubleshooting , tao , deepstream	10	651	October 20, 2023
Cannot infer with fpenet with TensorRT8.0 TAO Toolkit	14	1581	March 3, 2022
Cannot generate LPDNet tensorrt engine TAO Toolkit tensorrt , tao	2	655	April 2, 2023

LPRNet can't use exported engine file

Related topics