How to save time by converting ONNX to TensorRT

ShinobuHUYUGIRI · April 20, 2025, 10:49am

Description

This is a very basic question. When I run the attached source code, it takes time to convert the ONNX model to a TensorRT model every time. How can I eliminate the time it takes to convert to TensorRT? I’m implementing it while looking at the sample source code below, but I don’t understand.

For example, can I use the model.trt generated by running trtexec instead of ONNX?

github.com/NVIDIA/TensorRT

samples/python/introductory_parser_samples/onnx_resnet50.py

main

#
# SPDX-FileCopyrightText: Copyright (c) 1993-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

import os

# This sample uses an ONNX ResNet50 Model to create a TensorRT Inference Engine

This file has been truncated. show original

Environment

TensorRT Version: 10.3.0.30
GPU Type: NVIDIA Jetson Orin NX 8GB(VIA AMOS-9100)
Nvidia Driver Version: JetPack 6.1?
CUDA Version: 12.6.68
CUDNN Version: 9.3.0.75
Operating System + Version: JetPack 6.1 [L4T 36.4.0]
Python Version (if applicable): 3.10.12
TensorFlow Version (if applicable): None
PyTorch Version (if applicable): None
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

import tensorrt as trt
import cv2
import numpy as np
import common

ENGINE_FILE_PATH = "/home/via/sandbox/python/segmentation/model.trt"
ONNX_FILE_PATH = "/home/via/sandbox/python/segmentation/model.onnx"
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

class ModelData(object):
    MODEL_PATH = ONNX_FILE_PATH
    INPUT_SHAPE = (3, 288, 288)
    # We can convert TensorRT data types to numpy types with trt.nptype()
    DTYPE = trt.float32

# The Onnx path is used for Onnx models.
def build_engine_onnx(model_file):
    builder = trt.Builder(TRT_LOGGER)
    network = builder.create_network(0)
    config = builder.create_builder_config()
    parser = trt.OnnxParser(network, TRT_LOGGER)

    config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, common.GiB(1))
    # Load the Onnx model and parse it in order to populate the TensorRT network.
    with open(model_file, "rb") as model:
        if not parser.parse(model.read()):
            print("ERROR: Failed to parse the ONNX file.")
            for error in range(parser.num_errors):
                print(parser.get_error(error))
            return None

    engine_bytes = builder.build_serialized_network(network, config)
    runtime = trt.Runtime(TRT_LOGGER)
    return runtime.deserialize_cuda_engine(engine_bytes)

def get_input_image_tensor():
    # PreProcess
    bgr_image = cv2.imread("./dog.jpg")
    rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)
    height, width, channel = rgb_image.shape

    size = min(height, width)
    top = int((height - size) / 2)
    left = int((width - size) / 2)
    bottom = top + size
    right = left + size
    crop_img = rgb_image[top:bottom, left:right]

    rgb_ds = cv2.resize(crop_img,(288, 288))
    rgb_nchw = np.transpose(rgb_ds, (2, 0, 1))
    rgb_nchw = (rgb_nchw / 128.0) - 1.0
    rgb_batch = rgb_nchw[np.newaxis,:]

    return rgb_batch

def main():
    onnx_model_file = ONNX_FILE_PATH
    engine = build_engine_onnx(onnx_model_file)
    inputs, outputs, bindings, stream = common.allocate_buffers(engine)
    context = engine.create_execution_context()
    input_tensor = get_input_image_tensor()
    inputs[0].host = np.array(input_tensor, dtype='<f4')
    trt_outputs = common.do_inference(
        context,
        engine=engine,
        bindings=bindings,
        inputs=inputs,
        outputs=outputs,
        stream=stream,
    )
    print(trt_outputs)
    quit()

if __name__ == "__main__":
    main()

Steps To Reproduce

The problem occurs when you unpack the above tar.gz file, go to “sandbox/python/segmentation”, and run “python trt_resnet.py”. common.py and common_runtime.py were copied from the sample source code below.

github.com/NVIDIA/TensorRT

samples/python/common.py

main

#
# SPDX-FileCopyrightText: Copyright (c) 1993-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

import argparse
import os

This file has been truncated. show original

github.com/NVIDIA/TensorRT

samples/python/common_runtime.py

main

#
# SPDX-FileCopyrightText: Copyright (c) 1993-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

import ctypes
from typing import Optional, List, Union

This file has been truncated. show original

ShinobuHUYUGIRI · April 20, 2025, 3:09pm

The model converted from ONNX is converted to byte data by “build_serialized_network”, so I saved this and made it into a cache. Does this serialized data have a name or extension?

# The Onnx path is used for Onnx models.
def build_engine_onnx(model_file, cache_file):
    is_file = os.path.isfile(cache_file)
    if is_file:
        print("cache exist:", cache_file)
        with open(cache_file, 'rb') as f:
            engine_bytes = f.read()
    else:
        print("cache not exist:", cache_file)
        print("generate TensorRT model from onnx:", model_file)
        builder = trt.Builder(TRT_LOGGER)
        network = builder.create_network(0)
        config = builder.create_builder_config()
        parser = trt.OnnxParser(network, TRT_LOGGER)

        config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, common.GiB(1))
        # Load the Onnx model and parse it in order to populate the TensorRT network.
        with open(model_file, "rb") as model:
            if not parser.parse(model.read()):
                print("ERROR: Failed to parse the ONNX file.")
                for error in range(parser.num_errors):
                    print(parser.get_error(error))
                return None

        engine_bytes = builder.build_serialized_network(network, config)
        with open(cache_file, 'wb') as f:
            f.write(engine_bytes)

    runtime = trt.Runtime(TRT_LOGGER)
    return runtime.deserialize_cuda_engine(engine_bytes)

ShinobuHUYUGIRI · April 20, 2025, 5:36pm

I looked at the source code below and found that the appropriate extension for serialized data is *.engine.

All my questions are answered.
Thank you.

Topic		Replies	Views
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5401	June 29, 2022
ONNX model and TensorRT engine works differently TensorRT	5	738	February 20, 2023
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	963	September 29, 2022
Build TensorRT on Cuda compute capability 7.5 and make it backward compatible with previous capabilities TensorRT tensorrt	4	1926	May 19, 2022
Same version TensorRT with two methods to convert onnx model，One used trtexec[FAILED] , the other used python[Success] TensorRT	5	756	October 3, 2023
Simple ResNet model from PyTorch - "nan" Output TensorRT	1	1567	April 9, 2021
Use pre-trained object detection TF2 models with TensorRT ONNX TensorRT	9	1935	May 31, 2021
Trtexec failed to create an engine from onnx file with fp16 TensorRT	7	1227	July 8, 2022
ONNX to TensorRT conversion error: Error4 from graphShapeAnalyzer.cpp, (ITopKLayer /TopK: /TopK: K exceeds the maximum value allowed (3840).) TensorRT tensorrt , onnx , jetson-nano	2	457	May 21, 2024
Error Code 1: Cudnn (CUDNN_STATUS_EXECUTION_FAILED) TensorRT cuda	3	2203	May 31, 2022

How to save time by converting ONNX to TensorRT

Description

Environment

Relevant Files

Steps To Reproduce

Related topics