Unable to load in TensorRT model exported by TAO converter

Please provide the following information when requesting support.

• Hardware RTX3090
• Network Type vgg16
•> !tao info

Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.22.02
published_date: 02/28/2022

• TensorRT version 8.4.0-1+cuda10.2
Spec file unet_retrain_vgg_6S.txt (1.4 KB)

When loading the model from C++ program I get Version tag does not match:

[04/12/2022-18:45:04] [I] [TRT] [MemUsageChange] Init CUDA: CPU +177, GPU +0, now: CPU 202, GPU 2098 (MiB)
[04/12/2022-18:45:04] [I] [TRT] Loaded engine size: 23 MiB
[04/12/2022-18:45:04] [E] [TRT] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43)
[04/12/2022-18:45:04] [E] [TRT] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
segmentation_tutorial: tutorial-runtime.cpp:81: SampleSegmentation::SampleSegmentation(const string&): Assertion `mEngine.get() != nullptr’ failed.
Aborted (core dumped)

in the unet notebook exported the model with:

!tao converter -k $KEY  \
               -c $USER_EXPERIMENT_DIR/export/cal.bin \
               -e $USER_EXPERIMENT_DIR/export/int8.tlt.engine \
               -i nchw \
               -t int8 \
               -p input_1,1x3x512x512,4x3x512x512,16x3x512x512 \
               $USER_EXPERIMENT_DIR/retrain/weights/model_retrained.etlt
2022-04-12 19:39:15,289 [INFO] root: Registry: ['nvcr.io']
2022-04-12 19:39:15,452 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
[INFO] [MemUsageChange] Init CUDA: CPU +536, GPU +0, now: CPU 542, GPU 2360 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/filex1sQ36
[INFO] ONNX IR version:  0.0.6
[INFO] Opset version:    11
[INFO] Producer name:    keras2onnx
[INFO] Producer version: 1.8.1
[INFO] Domain:           onnxmltools
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[INFO] Detected input dimensions from the model: (-1, 3, 512, 512)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 512, 512) for input: input_1
[INFO] Using optimization profile opt shape: (4, 3, 512, 512) for input: input_1
[INFO] Using optimization profile max shape: (16, 3, 512, 512) for input: input_1
[INFO] [MemUsageSnapshot] Builder begin: CPU 602 MiB, GPU 2360 MiB
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +749, GPU +318, now: CPU 1351, GPU 2678 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +619, GPU +268, now: CPU 1970, GPU 2946 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 1 output network tensors.
[INFO] Total Host Persistent Memory: 74016
[INFO] Total Device Persistent Memory: 24048640
[INFO] Total Scratch Memory: 408969216
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 16 MiB, GPU 4 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2892, GPU 3381 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2892, GPU 3389 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2892, GPU 3373 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2891, GPU 3357 (MiB)
[INFO] [MemUsageSnapshot] Builder end: CPU 2891 MiB, GPU 3357 MiB
2022-04-12 19:40:43,842 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

To try the model, I am using the TensorRT quick start semanticsegmentation C++ program, attached

tutorial-runtime.cpp (6.4 KB)

made with the command: make CUDA_INSTALL_DIR=/usr/local/cuda-10.2

And the error occurs in line 82:

mEngine.reset(runtime->deserializeCudaEngine(engineData.data(), fsize, nullptr));

UPDATE 1:

Loaded the TAO docker like this:

nvidia-docker run --ipc=host  --rm -it --gpus all -p 8888:8888 -v `pwd`:/workspace -w /workspace/SemanticSegmentation nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 bash

Ran make and then the app from within the docker and there is no more a versioning issue, but now

[E] [TRT] 3: Cannot find binding of given name: output

This is the complete console output from the run:

root@fbd456ab264d:/workspace/SemanticSegmentation/bin# ./segmentation_tutorial
Reading File: ../int8.tlt.engine
[04/12/2022-20:35:24] [I] [TRT] [MemUsageChange] Init CUDA: CPU +536, GPU +0, now: CPU 564, GPU 2445 (MiB)
[04/12/2022-20:35:24] [I] [TRT] Loaded engine size: 24 MB
[04/12/2022-20:35:24] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 564 MiB, GPU 2445 MiB
[04/12/2022-20:35:24] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +749, GPU +318, now: CPU 1313, GPU 2787 (MiB)
[04/12/2022-20:35:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +619, GPU +268, now: CPU 1932, GPU 3055 (MiB)
[04/12/2022-20:35:25] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1223, GPU 2733 (MiB)
[04/12/2022-20:35:25] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1223 MiB, GPU 2733 MiB
../input.png Exists
[04/12/2022-20:35:25] [I] Running TensorRT inference for int8.tlt.engine
[04/12/2022-20:35:25] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1199 MiB, GPU 2733 MiB
[04/12/2022-20:35:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +709, GPU +314, now: CPU 1908, GPU 3047 (MiB)
[04/12/2022-20:35:26] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1908, GPU 3055 (MiB)
[04/12/2022-20:35:26] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1914 MiB, GPU 5089 MiB
[04/12/2022-20:35:26] [E] [TRT] 3: Cannot find binding of given name: output
[04/12/2022-20:35:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1205, GPU 4743 (MiB)
 

What is the proper output binding name for unet vgg16? softmax_1?

Please refer to official inference github for deepstream.

https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/master/configs/unet_tao/pgie_unet_tao_config.txt#L43

output-blob-names=softmax_1

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.