Please provide the following information when requesting support.
• Hardware RTX3090
• Network Type vgg16
•> !tao info
Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.22.02
published_date: 02/28/2022
• TensorRT version 8.4.0-1+cuda10.2
Spec file unet_retrain_vgg_6S.txt (1.4 KB)
When loading the model from C++ program I get Version tag does not match:
[04/12/2022-18:45:04] [I] [TRT] [MemUsageChange] Init CUDA: CPU +177, GPU +0, now: CPU 202, GPU 2098 (MiB)
[04/12/2022-18:45:04] [I] [TRT] Loaded engine size: 23 MiB
[04/12/2022-18:45:04] [E] [TRT] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43)
[04/12/2022-18:45:04] [E] [TRT] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
segmentation_tutorial: tutorial-runtime.cpp:81: SampleSegmentation::SampleSegmentation(const string&): Assertion `mEngine.get() != nullptr’ failed.
Aborted (core dumped)
in the unet notebook exported the model with:
!tao converter -k $KEY \
-c $USER_EXPERIMENT_DIR/export/cal.bin \
-e $USER_EXPERIMENT_DIR/export/int8.tlt.engine \
-i nchw \
-t int8 \
-p input_1,1x3x512x512,4x3x512x512,16x3x512x512 \
$USER_EXPERIMENT_DIR/retrain/weights/model_retrained.etlt
2022-04-12 19:39:15,289 [INFO] root: Registry: ['nvcr.io']
2022-04-12 19:39:15,452 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
[INFO] [MemUsageChange] Init CUDA: CPU +536, GPU +0, now: CPU 542, GPU 2360 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename: /tmp/filex1sQ36
[INFO] ONNX IR version: 0.0.6
[INFO] Opset version: 11
[INFO] Producer name: keras2onnx
[INFO] Producer version: 1.8.1
[INFO] Domain: onnxmltools
[INFO] Model version: 0
[INFO] Doc string:
[INFO] ----------------------------------------------------------------
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[INFO] Detected input dimensions from the model: (-1, 3, 512, 512)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 512, 512) for input: input_1
[INFO] Using optimization profile opt shape: (4, 3, 512, 512) for input: input_1
[INFO] Using optimization profile max shape: (16, 3, 512, 512) for input: input_1
[INFO] [MemUsageSnapshot] Builder begin: CPU 602 MiB, GPU 2360 MiB
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +749, GPU +318, now: CPU 1351, GPU 2678 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +619, GPU +268, now: CPU 1970, GPU 2946 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 1 output network tensors.
[INFO] Total Host Persistent Memory: 74016
[INFO] Total Device Persistent Memory: 24048640
[INFO] Total Scratch Memory: 408969216
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 16 MiB, GPU 4 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2892, GPU 3381 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2892, GPU 3389 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2892, GPU 3373 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2891, GPU 3357 (MiB)
[INFO] [MemUsageSnapshot] Builder end: CPU 2891 MiB, GPU 3357 MiB
2022-04-12 19:40:43,842 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
To try the model, I am using the TensorRT quick start semanticsegmentation C++ program, attached
tutorial-runtime.cpp (6.4 KB)
made with the command: make CUDA_INSTALL_DIR=/usr/local/cuda-10.2
And the error occurs in line 82:
mEngine.reset(runtime->deserializeCudaEngine(engineData.data(), fsize, nullptr));