• Hardware RTX 3060
• Network Type LPRNet
• TAO toolkit_version: 3.21.11
• Training spec file my_spec.txt (1.2 KB)
I followed lprnet notebook in cv_samples_v1.3.0 to train lprnet using my own model. Everything worked well, exported model to trt engine file. Evaluation of engine file works, but if I try to load model in my own code:
def load_engine(trt_runtime, engine_path):
trt.init_libnvinfer_plugins(None, "")
with open(engine_path, 'rb') as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
return engine
I get an error:
[12/09/2021-08:14:00] [TRT] [E] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43)
[12/09/2021-08:14:00] [TRT] [E] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
From googling I managed to find out that it is due to TensorRT version mismatch.
On my machine:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
$ dpkg -l | grep TensorRT
ii libnvinfer-bin 8.0.1-1+cuda11.3 amd64 TensorRT binaries
ii libnvinfer-dev 8.0.1-1+cuda11.3 amd64 TensorRT development libraries and headers
ii libnvinfer-doc 8.0.1-1+cuda11.3 all TensorRT documentation
ii libnvinfer-plugin-dev 8.0.1-1+cuda11.3 amd64 TensorRT plugin libraries
ii libnvinfer-plugin8 8.0.1-1+cuda11.3 amd64 TensorRT plugin libraries
ii libnvinfer-samples 8.0.1-1+cuda11.3 all TensorRT samples
ii libnvinfer8 8.0.1-1+cuda11.3 amd64 TensorRT runtime libraries
ii libnvonnxparsers-dev 8.0.1-1+cuda11.3 amd64 TensorRT ONNX libraries
ii libnvonnxparsers8 8.0.1-1+cuda11.3 amd64 TensorRT ONNX libraries
ii libnvparsers-dev 8.0.1-1+cuda11.3 amd64 TensorRT parsers libraries
ii libnvparsers8 8.0.1-1+cuda11.3 amd64 TensorRT parsers libraries
[INFO] [MemUsageChange] Init CUDA: CPU +534, GPU +0, now: CPU 540, GPU 574 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename: /tmp/fileSngjLn
[INFO] ONNX IR version: 0.0.7
[INFO] Opset version: 13
[INFO] Producer name: keras2onnx
[INFO] Producer version: 1.8.1
[INFO] Domain: onnxmltools
[INFO] Model version: 0
[INFO] Doc string:
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] ShapedWeights.cpp:173: Weights td_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[INFO] [MemUsageSnapshot] Builder begin: CPU 595 MiB, GPU 574 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +806, GPU +350, now: CPU 1401, GPU 924 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +125, GPU +58, now: CPU 1526, GPU 982 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2694, GPU 1503 (MiB)
[ERROR] 1: [codeGenerator.cpp::compileGraph::476] Error Code 1: Myelin (No results returned from cublas heuristic search)
[ERROR] Unable to create engine
Segmentation fault (core dumped)
[INFO] [MemUsageChange] Init CUDA: CPU +534, GPU +0, now: CPU 540, GPU 590 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename: /tmp/fileG7Jijo
[INFO] ONNX IR version: 0.0.7
[INFO] Opset version: 12
[INFO] Producer name: keras2onnx
[INFO] Producer version: 1.7.0
[INFO] Domain: onnxmltools
[INFO] Model version: 0
[INFO] Doc string:
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] ShapedWeights.cpp:173: Weights td_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[INFO] [MemUsageSnapshot] Builder begin: CPU 595 MiB, GPU 590 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +806, GPU +350, now: CPU 1401, GPU 940 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +125, GPU +58, now: CPU 1526, GPU 998 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2695, GPU 1518 (MiB)
[ERROR] 1: [codeGenerator.cpp::compileGraph::476] Error Code 1: Myelin (No results returned from cublas heuristic search)
[ERROR] Unable to create engine
Segmentation fault (core dumped)
OK, since it is failed in generating tensorrt engine with official sample etlt model, there should be something mismatching in your RTX3060.
Please double check the CUDA/Cudnn/TensorRT installation.
More, you can try to run in Xavier NX firstly. Please make sure use the corresponding version of tao-converter.
In Xavier NX, just copy the etlt model in it, and download correct version of tao-converter, then generate trt engine. It is not needed to install tao.