Jetson-Inference predictions differ from e.g. tensorflow predictions

Benutzer285 · October 22, 2021, 9:21am

Hi everyone!

I just trained an image classification model on my PC using tensorflow, resulting in an .pb model.
I converted it into onnx and used it on my jetson with jetson-inference.
But surprisingly the results differ from what i get when i use inference on my PC (with either tensorflow or onnx-runtime).
I used the same model, the same image (with correct rgb representation).

What could be the error? Or is this normal? Do i have to train my model using jetson inference?

Thanks in advance
Timo

AastaLLL · October 25, 2021, 2:13am

Hi,

It will be good to check this issue is from TensorRT or jetson_inference itself.
Would you mind running your model with pure TensorRT directly?

You can find an example below:

Thanks.

Benutzer285 · October 27, 2021, 2:46pm

Hi @AastaLLL,

thanks for your reply.
In the meantime, i read that this error could be caused by using TensorRT 7.1.3(Output from ONNX inference and trt inference are different · Issue #1194 · NVIDIA/TensorRT · GitHub)
Now i updated to Jetpack4.6 with TensorRT 8.0.1 - but i still get the same wrong predictions from jetson inference.

Then i tried your pure TensorRT code to run my model resulting in “AttributeError: ‘tensorrt.tensorrt.Builder’ object has no attribute ‘max_workspace_size’”. This is to be solved by downgrading to TensorRT 7.x as stated here:(AttributeError: 'tensorrt.tensorrt.Builder' object has no attribute 'max_workspace_size' · Issue #557 · NVIDIA-AI-IOT/torch2trt · GitHub).

So i switched back to my JetPack4.5.1 with TensorRT 7.1.3 installed and running the pure TensorRT code results in “killed” or just keeps running without any result. Seems like building the model now somehow takes longer than with jetson-inference. I guess i could try to build the engine on my PC and only run inference on my jetson?

Any suggestions?
Thanks for your help
Timo

Benutzer285 · October 28, 2021, 1:55pm

Hi @AastaLLL,

i managed to run TensorRT inference by adapting your script to the changes in TensorRT API 8.0.1.
I changed the image preprocessing in tensorflow and TensorRT to match the preprocessing in training and now i get same predictions on both.

Here is the script i used for TensorRT inference:

#!/usr/bin/env python3
#
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import cv2
import numpy as np
import os
import time

TRT_LOGGER = trt.Logger()

EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

# Simple helper data class that's a little nicer to use than a 2-tuple.
class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem):
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()

# Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
def allocate_buffers(engine):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
    return inputs, outputs, bindings, stream

def get_engine(onnx_file_path, engine_file_path):
    """Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""
    def build_engine():
        """Takes an ONNX file and creates a TensorRT engine to run inference with"""
        with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, builder.create_builder_config() as config, trt.OnnxParser(network, TRT_LOGGER) as parser, trt.Runtime(TRT_LOGGER) as runtime:
            config.max_workspace_size = 1 << 28 # 256MiB
            builder.max_batch_size = 1
            # Parse model file
            if not os.path.exists(onnx_file_path):
                print('ONNX file {} not found.'.format(onnx_file_path))
                exit(0)
            print('Loading ONNX file from path {}...'.format(onnx_file_path))
            with open(onnx_file_path, 'rb') as model:
                print('Beginning ONNX file parsing')
                if not parser.parse(model.read()):
                    print ('ERROR: Failed to parse the ONNX file.')
                    for error in range(parser.num_errors):
                        print (parser.get_error(error))
                    return None
            # The actual yolov3.onnx is generated with batch size 64. Reshape input to batch size 1
            network.get_input(0).shape = [1, 128, 128, 3]
            print('Completed parsing of ONNX file')
            print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))
            plan = builder.build_serialized_network(network, config)
            engine = runtime.deserialize_cuda_engine(plan)
            print("Completed creating Engine")
            with open(engine_file_path, "wb") as f:
                f.write(plan)
            return engine

    if os.path.exists(engine_file_path):
        # If a serialized engine exists, use it instead of building an engine.
        print("Reading engine from file {}".format(engine_file_path))
        with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
            return runtime.deserialize_cuda_engine(f.read())
    else:
        return build_engine()



def Inference(engine, image):
    inputs, outputs, bindings, stream = allocate_buffers(engine)
    inputs[0].host = image
    context = engine.create_execution_context()

    start_time = time.time()
    cuda.memcpy_htod_async(inputs[0].device, inputs[0].host, stream)
    context.execute_async(bindings=bindings, stream_handle=stream.handle)
    cuda.memcpy_dtoh_async(outputs[0].host, outputs[0].device, stream)
    stream.synchronize()
    return outputs[0].host

def main():
    #model path
    onnx_file_path = './mobilenetv2-FPT_fix_dim/model.onnx'
    engine_file_path = './mobilenetv2-FPT_fix_dim/model.onnx.engine'

    #load image
    image = cv2.imread("testimg.png")
    image = cv2.cvtColor(image, code=cv2.COLOR_BGR2RGB)
    #image = image.transpose((2, 0, 1))
    image = cv2.normalize(image, None, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
    image = np.ascontiguousarray(image)

    #load or build engine
    engine = get_engine(onnx_file_path, engine_file_path)
    probs = Inference(engine, image)
    print("PROBS: {}".format(probs))

if __name__ == '__main__':
    main()

Thanks for your help
Timo

system · November 17, 2021, 6:26am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Engine Plan Inference on JetsonTX2 Jetson TX2 tensorrt , python	11	1843	October 18, 2021
Inference error while using tensorrt engine on jetson nano Jetson Nano tensorrt , nvbugs	23	3603	April 20, 2022
Tensorrt Inference in Real time Jetson Nano tensorrt , jetson-inference , gstreamer , python	8	1724	April 12, 2023
Custom ResNet Jetson Xavier Jetson Xavier NX jetson-inference	12	3043	October 18, 2021
Custom trained model on Jetson Nano Jetson Nano tensorrt	8	1833	October 15, 2021
Inference time on jetson nano Jetson AGX Xavier tensorrt , cuda , kernel , jetson-inference	2	938	May 30, 2022
TensorRT ( C++ ) inference strange behavior on Jetson AGX Xavier TensorRT cudnn	0	16	January 15, 2025
Cannot Convert Custom Model To TensorRT TensorRT	10	1761	October 12, 2021
TensorRT Inference error on Jetson nano TensorRT	3	1186	December 6, 2021
Low FPS on Jetson Nano using TensorRT Jetson Nano tensorrt , tensorflow	7	1205	August 27, 2020

Jetson-Inference predictions differ from e.g. tensorflow predictions

Related topics