Unable to load in TensorRT model exported by TAO converter

david221 · April 12, 2022, 4:07pm

Please provide the following information when requesting support.

• Hardware RTX3090
• Network Type vgg16
•> !tao info

Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.22.02
published_date: 02/28/2022

• TensorRT version 8.4.0-1+cuda10.2
Spec file unet_retrain_vgg_6S.txt (1.4 KB)

When loading the model from C++ program I get Version tag does not match:

[04/12/2022-18:45:04] [I] [TRT] [MemUsageChange] Init CUDA: CPU +177, GPU +0, now: CPU 202, GPU 2098 (MiB)
[04/12/2022-18:45:04] [I] [TRT] Loaded engine size: 23 MiB
[04/12/2022-18:45:04] [E] [TRT] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43)
[04/12/2022-18:45:04] [E] [TRT] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
segmentation_tutorial: tutorial-runtime.cpp:81: SampleSegmentation::SampleSegmentation(const string&): Assertion `mEngine.get() != nullptr’ failed.
Aborted (core dumped)

in the unet notebook exported the model with:

!tao converter -k $KEY  \
               -c $USER_EXPERIMENT_DIR/export/cal.bin \
               -e $USER_EXPERIMENT_DIR/export/int8.tlt.engine \
               -i nchw \
               -t int8 \
               -p input_1,1x3x512x512,4x3x512x512,16x3x512x512 \
               $USER_EXPERIMENT_DIR/retrain/weights/model_retrained.etlt

2022-04-12 19:39:15,289 [INFO] root: Registry: ['nvcr.io']
2022-04-12 19:39:15,452 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
[INFO] [MemUsageChange] Init CUDA: CPU +536, GPU +0, now: CPU 542, GPU 2360 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/filex1sQ36
[INFO] ONNX IR version:  0.0.6
[INFO] Opset version:    11
[INFO] Producer name:    keras2onnx
[INFO] Producer version: 1.8.1
[INFO] Domain:           onnxmltools
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[INFO] Detected input dimensions from the model: (-1, 3, 512, 512)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 512, 512) for input: input_1
[INFO] Using optimization profile opt shape: (4, 3, 512, 512) for input: input_1
[INFO] Using optimization profile max shape: (16, 3, 512, 512) for input: input_1
[INFO] [MemUsageSnapshot] Builder begin: CPU 602 MiB, GPU 2360 MiB
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +749, GPU +318, now: CPU 1351, GPU 2678 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +619, GPU +268, now: CPU 1970, GPU 2946 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 1 output network tensors.
[INFO] Total Host Persistent Memory: 74016
[INFO] Total Device Persistent Memory: 24048640
[INFO] Total Scratch Memory: 408969216
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 16 MiB, GPU 4 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2892, GPU 3381 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2892, GPU 3389 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2892, GPU 3373 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2891, GPU 3357 (MiB)
[INFO] [MemUsageSnapshot] Builder end: CPU 2891 MiB, GPU 3357 MiB
2022-04-12 19:40:43,842 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

To try the model, I am using the TensorRT quick start semanticsegmentation C++ program, attached

tutorial-runtime.cpp (6.4 KB)

made with the command: make CUDA_INSTALL_DIR=/usr/local/cuda-10.2

And the error occurs in line 82:

mEngine.reset(runtime->deserializeCudaEngine(engineData.data(), fsize, nullptr));

david221 · April 12, 2022, 8:08pm

UPDATE 1:

Loaded the TAO docker like this:

nvidia-docker run --ipc=host  --rm -it --gpus all -p 8888:8888 -v `pwd`:/workspace -w /workspace/SemanticSegmentation nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 bash

Ran make and then the app from within the docker and there is no more a versioning issue, but now

[E] [TRT] 3: Cannot find binding of given name: output

This is the complete console output from the run:

root@fbd456ab264d:/workspace/SemanticSegmentation/bin# ./segmentation_tutorial
Reading File: ../int8.tlt.engine
[04/12/2022-20:35:24] [I] [TRT] [MemUsageChange] Init CUDA: CPU +536, GPU +0, now: CPU 564, GPU 2445 (MiB)
[04/12/2022-20:35:24] [I] [TRT] Loaded engine size: 24 MB
[04/12/2022-20:35:24] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 564 MiB, GPU 2445 MiB
[04/12/2022-20:35:24] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +749, GPU +318, now: CPU 1313, GPU 2787 (MiB)
[04/12/2022-20:35:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +619, GPU +268, now: CPU 1932, GPU 3055 (MiB)
[04/12/2022-20:35:25] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1223, GPU 2733 (MiB)
[04/12/2022-20:35:25] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1223 MiB, GPU 2733 MiB
../input.png Exists
[04/12/2022-20:35:25] [I] Running TensorRT inference for int8.tlt.engine
[04/12/2022-20:35:25] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1199 MiB, GPU 2733 MiB
[04/12/2022-20:35:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +709, GPU +314, now: CPU 1908, GPU 3047 (MiB)
[04/12/2022-20:35:26] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1908, GPU 3055 (MiB)
[04/12/2022-20:35:26] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1914 MiB, GPU 5089 MiB
[04/12/2022-20:35:26] [E] [TRT] 3: Cannot find binding of given name: output
[04/12/2022-20:35:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1205, GPU 4743 (MiB)

What is the proper output binding name for unet vgg16? softmax_1?

Morganh · April 13, 2022, 2:02am

Please refer to official inference github for deepstream.

https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/master/configs/unet_tao/pgie_unet_tao_config.txt#L43

output-blob-names=softmax_1

system · May 3, 2022, 3:21am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
LPRNet can't use exported engine file TAO Toolkit	18	2509	December 28, 2021
TensorRT Model load error in Deepstream v6.0.1 with TensorRT 8.5.3.1 TAO Toolkit	9	1448	February 10, 2023
Cannot infer with fpenet with TensorRT8.0 TAO Toolkit	14	1581	March 3, 2022
TensorRT 10.8 on Windows: API Usage Error (Target GPU SM 120 is not supported by this TensorRT release.) TensorRT cudnn	3	187	March 27, 2025
Bpnet sample code error TAO Toolkit	13	774	October 11, 2022
ONNX model and TensorRT engine works differently TensorRT	5	725	February 20, 2023
Cannot convert model with dynamic input shape to TRT engine TAO Toolkit	9	1160	October 12, 2021
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory) TAO Toolkit tensorrt	2	1053	October 12, 2021
Deploy TAO Classification_pyt FAN for Jetson Nano TAO Toolkit tensorrt , jetson-inference , tao , deepstream	15	360	April 8, 2024
Falure to do inference TAO Toolkit tensorrt	9	1071	January 11, 2022

Unable to load in TensorRT model exported by TAO converter

Related topics