LPRNet can't use exported engine file

• Hardware RTX 3060
• Network Type LPRNet
• TAO toolkit_version: 3.21.11
• Training spec file
my_spec.txt (1.2 KB)

I followed lprnet notebook in cv_samples_v1.3.0 to train lprnet using my own model. Everything worked well, exported model to trt engine file. Evaluation of engine file works, but if I try to load model in my own code:

    def load_engine(trt_runtime, engine_path):
        trt.init_libnvinfer_plugins(None, "")
        with open(engine_path, 'rb') as f:
            engine_data = f.read()
        engine = trt_runtime.deserialize_cuda_engine(engine_data)
        return engine

I get an error:

[12/09/2021-08:14:00] [TRT] [E] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43)
[12/09/2021-08:14:00] [TRT] [E] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)

From googling I managed to find out that it is due to TensorRT version mismatch.
On my machine:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
$ dpkg -l | grep TensorRT
ii  libnvinfer-bin                                              8.0.1-1+cuda11.3                      amd64        TensorRT binaries
ii  libnvinfer-dev                                              8.0.1-1+cuda11.3                      amd64        TensorRT development libraries and headers
ii  libnvinfer-doc                                              8.0.1-1+cuda11.3                      all          TensorRT documentation
ii  libnvinfer-plugin-dev                                       8.0.1-1+cuda11.3                      amd64        TensorRT plugin libraries
ii  libnvinfer-plugin8                                          8.0.1-1+cuda11.3                      amd64        TensorRT plugin libraries
ii  libnvinfer-samples                                          8.0.1-1+cuda11.3                      all          TensorRT samples
ii  libnvinfer8                                                 8.0.1-1+cuda11.3                      amd64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                                        8.0.1-1+cuda11.3                      amd64        TensorRT ONNX libraries
ii  libnvonnxparsers8                                           8.0.1-1+cuda11.3                      amd64        TensorRT ONNX libraries
ii  libnvparsers-dev                                            8.0.1-1+cuda11.3                      amd64        TensorRT parsers libraries
ii  libnvparsers8                                               8.0.1-1+cuda11.3                      amd64        TensorRT parsers libraries

Yes, if you want to deploy the lpr tensorrt engine in one device, please make sure the tensorrt version matches.

You can use corresponding tao-converter to generate lpr tensorrt engine again based on your etlt model.

https://docs.nvidia.com/tao/tao-toolkit/text/tensorrt.html#installing-the-tao-converter

https://docs.nvidia.com/tao/tao-toolkit/text/character_recognition/lprnet.html#using-tao-converter

Tried using tao-converter-x86-tensorrt8.0 but it gives me another error:

[ERROR] 1: [codeGenerator.cpp::compileGraph::476] Error Code 1: Myelin (No results returned from cublas heuristic search)

Which device do you want to run your tensorrt engine? RTX 3060 ?

RTX 3060, RTX A200 and Jetson Xavier NX (Jetpack 4.4)

Which device meet above error "[ERROR] 1: [codeGenerator.cpp::compileGraph::476] Error Code 1: Myelin " ?

It was RTX 3060

1 Like

Could you share the full command and full log?

Sure:

./tao-converter lprnet.etlt -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 -e lprnet.engine
[INFO] [MemUsageChange] Init CUDA: CPU +534, GPU +0, now: CPU 540, GPU 574 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/fileSngjLn
[INFO] ONNX IR version:  0.0.7
[INFO] Opset version:    13
[INFO] Producer name:    keras2onnx
[INFO] Producer version: 1.8.1
[INFO] Domain:           onnxmltools
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] ShapedWeights.cpp:173: Weights td_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[INFO] [MemUsageSnapshot] Builder begin: CPU 595 MiB, GPU 574 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +806, GPU +350, now: CPU 1401, GPU 924 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +125, GPU +58, now: CPU 1526, GPU 982 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2694, GPU 1503 (MiB)
[ERROR] 1: [codeGenerator.cpp::compileGraph::476] Error Code 1: Myelin (No results returned from cublas heuristic search)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

To narrow down, could you download official sample etlt model and retry?
wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/lprnet/versions/deployable_v1.0/files/us_lprnet_baseline18_deployable.etlt

./tao-converter us_lprnet_baseline18_deployable.etlt -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96  -t fp16 -e lprnet.engine

Same error:

[INFO] [MemUsageChange] Init CUDA: CPU +534, GPU +0, now: CPU 540, GPU 590 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/fileG7Jijo
[INFO] ONNX IR version:  0.0.7
[INFO] Opset version:    12
[INFO] Producer name:    keras2onnx
[INFO] Producer version: 1.7.0
[INFO] Domain:           onnxmltools
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] ShapedWeights.cpp:173: Weights td_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[INFO] [MemUsageSnapshot] Builder begin: CPU 595 MiB, GPU 590 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +806, GPU +350, now: CPU 1401, GPU 940 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +125, GPU +58, now: CPU 1526, GPU 998 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2695, GPU 1518 (MiB)
[ERROR] 1: [codeGenerator.cpp::compileGraph::476] Error Code 1: Myelin (No results returned from cublas heuristic search)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

May I know you run above command inside tao docker or outside tao docker?

Outside tao docker.

If I run converter inside docker, conversion runs fine, but engine isn’t usable outside of it.

OK, since it is failed in generating tensorrt engine with official sample etlt model, there should be something mismatching in your RTX3060.
Please double check the CUDA/Cudnn/TensorRT installation.

More, you can try to run in Xavier NX firstly. Please make sure use the corresponding version of tao-converter.

Will try and let you know!
Thanks

1 Like

I have tried different approach.
Installed tao via pip (not in venv). When I try to convert model, I get new error.

2021-12-09 13:52:13,742 [INFO] root: Registry: ['nvcr.io']
2021-12-09 13:52:13,774 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
Error: no input dimensions given
2021-12-09 13:52:15,960 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

In Xavier NX, just copy the etlt model in it, and download correct version of tao-converter, then generate trt engine. It is not needed to install tao.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.