Jetson-Inference predictions differ from e.g. tensorflow predictions

Hi everyone!

I just trained an image classification model on my PC using tensorflow, resulting in an .pb model.
I converted it into onnx and used it on my jetson with jetson-inference.
But surprisingly the results differ from what i get when i use inference on my PC (with either tensorflow or onnx-runtime).
I used the same model, the same image (with correct rgb representation).

What could be the error? Or is this normal? Do i have to train my model using jetson inference?

Thanks in advance


It will be good to check this issue is from TensorRT or jetson_inference itself.
Would you mind running your model with pure TensorRT directly?

You can find an example below:


Hi @AastaLLL,

thanks for your reply.
In the meantime, i read that this error could be caused by using TensorRT 7.1.3(Output from ONNX inference and trt inference are different · Issue #1194 · NVIDIA/TensorRT · GitHub)
Now i updated to Jetpack4.6 with TensorRT 8.0.1 - but i still get the same wrong predictions from jetson inference.

Then i tried your pure TensorRT code to run my model resulting in “AttributeError: ‘tensorrt.tensorrt.Builder’ object has no attribute ‘max_workspace_size’”. This is to be solved by downgrading to TensorRT 7.x as stated here:(AttributeError: 'tensorrt.tensorrt.Builder' object has no attribute 'max_workspace_size' · Issue #557 · NVIDIA-AI-IOT/torch2trt · GitHub).

So i switched back to my JetPack4.5.1 with TensorRT 7.1.3 installed and running the pure TensorRT code results in “killed” or just keeps running without any result. Seems like building the model now somehow takes longer than with jetson-inference. I guess i could try to build the engine on my PC and only run inference on my jetson?

Any suggestions?
Thanks for your help

Hi @AastaLLL,

i managed to run TensorRT inference by adapting your script to the changes in TensorRT API 8.0.1.
I changed the image preprocessing in tensorflow and TensorRT to match the preprocessing in training and now i get same predictions on both.

Here is the script i used for TensorRT inference:

#!/usr/bin/env python3
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import cv2
import numpy as np
import os
import time

TRT_LOGGER = trt.Logger()

EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

# Simple helper data class that's a little nicer to use than a 2-tuple.
class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem): = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str( + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()

# Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
def allocate_buffers(engine):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
            outputs.append(HostDeviceMem(host_mem, device_mem))
    return inputs, outputs, bindings, stream

def get_engine(onnx_file_path, engine_file_path):
    """Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""
    def build_engine():
        """Takes an ONNX file and creates a TensorRT engine to run inference with"""
        with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, builder.create_builder_config() as config, trt.OnnxParser(network, TRT_LOGGER) as parser, trt.Runtime(TRT_LOGGER) as runtime:
            config.max_workspace_size = 1 << 28 # 256MiB
            builder.max_batch_size = 1
            # Parse model file
            if not os.path.exists(onnx_file_path):
                print('ONNX file {} not found.'.format(onnx_file_path))
            print('Loading ONNX file from path {}...'.format(onnx_file_path))
            with open(onnx_file_path, 'rb') as model:
                print('Beginning ONNX file parsing')
                if not parser.parse(
                    print ('ERROR: Failed to parse the ONNX file.')
                    for error in range(parser.num_errors):
                        print (parser.get_error(error))
                    return None
            # The actual yolov3.onnx is generated with batch size 64. Reshape input to batch size 1
            network.get_input(0).shape = [1, 128, 128, 3]
            print('Completed parsing of ONNX file')
            print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))
            plan = builder.build_serialized_network(network, config)
            engine = runtime.deserialize_cuda_engine(plan)
            print("Completed creating Engine")
            with open(engine_file_path, "wb") as f:
            return engine

    if os.path.exists(engine_file_path):
        # If a serialized engine exists, use it instead of building an engine.
        print("Reading engine from file {}".format(engine_file_path))
        with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
            return runtime.deserialize_cuda_engine(
        return build_engine()

def Inference(engine, image):
    inputs, outputs, bindings, stream = allocate_buffers(engine)
    inputs[0].host = image
    context = engine.create_execution_context()

    start_time = time.time()
    cuda.memcpy_htod_async(inputs[0].device, inputs[0].host, stream)
    context.execute_async(bindings=bindings, stream_handle=stream.handle)
    cuda.memcpy_dtoh_async(outputs[0].host, outputs[0].device, stream)
    return outputs[0].host

def main():
    #model path
    onnx_file_path = './mobilenetv2-FPT_fix_dim/model.onnx'
    engine_file_path = './mobilenetv2-FPT_fix_dim/model.onnx.engine'

    #load image
    image = cv2.imread("testimg.png")
    image = cv2.cvtColor(image, code=cv2.COLOR_BGR2RGB)
    #image = image.transpose((2, 0, 1))
    image = cv2.normalize(image, None, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
    image = np.ascontiguousarray(image)

    #load or build engine
    engine = get_engine(onnx_file_path, engine_file_path)
    probs = Inference(engine, image)
    print("PROBS: {}".format(probs))

if __name__ == '__main__':

Thanks for your help

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.