Inference result gets worse when converting pytorch model to TensorRT model

Hi

I tried to convert pytorch model of YOLOv4 to onnx model and then to TensorRT model (engine). I found that the inference result of TensorRT model is worse than that of pytorch model in terms of both regression precission and classification score.
I used float32 in both TensorRT model and pytorch model, and I checked the onnx model that its inference result is same with that of pytorch model. So I am confused.
I wonder is it normal that TensorRT model will cause the regression precission descent or is there any method that I can use to find out the reason.

Environment

TensorRT Version: 7.1.3.4
GPU Type: Taitan RTX
Nvidia Driver Version: 440.87
CUDA Version: 10.2
CUDNN Version: packed with CUDA
Operating System + Version: Ubuntu 16.04
Python Version (if applicable): 3.7
PyTorch Version (if applicable): 1.2

python script

def ONNX_build_engine(onnx_file_path, engine_path):

    G_LOGGER = trt.Logger(trt.Logger.WARNING)
    explicit_batch = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    """Takes an ONNX file and creates a TensorRT engine to run inference with"""
    with trt.Builder(G_LOGGER) as builder, builder.create_network(explicit_batch) as network, trt.OnnxParser(network,
                                                                                               G_LOGGER) as parser:
        builder.max_batch_size = 100
        builder.max_workspace_size = 1 << 30
        builder.fp16_mode = True

        print('Loading ONNX file from path {}...'.format(onnx_file_path))
        with open(onnx_file_path, 'rb') as model:
            print('Beginning ONNX file parsing')
            parser.parse(model.read())
        print('Completed parsing of ONNX file')

        print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))
        engine = builder.build_cuda_engine(network)
        print("Completed creating Engine")

        with open(engine_path, "wb") as f:
            f.write(engine.serialize())
        return engine

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hi,

We also recommend you to please try on latest TensorRT version 8.2.
https://developer.nvidia.com/nvidia-tensorrt-8x-download

Thank you.

Thank you for your reply. I tried onnx.checker.check_model, no error occured. And I ran the model with trtexec, got the engine file sucessfully, but the issue mentioned above still existed. This is the linkage of the onnx model, model_200000.onnx - Google Drive
and I uploaded the output log above. I want to know dose this kind of issue commonly occur.

The log of using trtexec

delu@delu:~/Downloads/TensorRT-7.1.3.4/bin$ ./trtexec --onnx=/home/delu/deeplearning/yolo3Dv9.1_8.2/scripts/model_200000.onnx --shapes=input1:1x3x480x960 --saveEngine=/home/delu/deeplearning/yolo3Dv9.1_8.2/scripts/model_200000.engine
&&&& RUNNING TensorRT.trtexec # ./trtexec --onnx=/home/delu/deeplearning/yolo3Dv9.1_8.2/scripts/model_200000.onnx --shapes=input1:1x3x480x960 --saveEngine=/home/delu/deeplearning/yolo3Dv9.1_8.2/scripts/model_200000.engine
[01/07/2022-18:34:48] [I] === Model Options ===
[01/07/2022-18:34:48] [I] Format: ONNX
[01/07/2022-18:34:48] [I] Model: /home/delu/deeplearning/yolo3Dv9.1_8.2/scripts/model_200000.onnx
[01/07/2022-18:34:48] [I] Output:
[01/07/2022-18:34:48] [I] === Build Options ===
[01/07/2022-18:34:48] [I] Max batch: explicit
[01/07/2022-18:34:48] [I] Workspace: 16 MB
[01/07/2022-18:34:48] [I] minTiming: 1
[01/07/2022-18:34:48] [I] avgTiming: 8
[01/07/2022-18:34:48] [I] Precision: FP32
[01/07/2022-18:34:48] [I] Calibration: 
[01/07/2022-18:34:48] [I] Safe mode: Disabled
[01/07/2022-18:34:48] [I] Save engine: /home/delu/deeplearning/yolo3Dv9.1_8.2/scripts/model_200000.engine
[01/07/2022-18:34:48] [I] Load engine: 
[01/07/2022-18:34:48] [I] Builder Cache: Enabled
[01/07/2022-18:34:48] [I] NVTX verbosity: 0
[01/07/2022-18:34:48] [I] Inputs format: fp32:CHW
[01/07/2022-18:34:48] [I] Outputs format: fp32:CHW
[01/07/2022-18:34:48] [I] Input build shape: input1=1x3x480x960+1x3x480x960+1x3x480x960
[01/07/2022-18:34:48] [I] Input calibration shapes: model
[01/07/2022-18:34:48] [I] === System Options ===
[01/07/2022-18:34:48] [I] Device: 0
[01/07/2022-18:34:48] [I] DLACore: 
[01/07/2022-18:34:48] [I] Plugins:
[01/07/2022-18:34:48] [I] === Inference Options ===
[01/07/2022-18:34:48] [I] Batch: Explicit
[01/07/2022-18:34:48] [I] Input inference shape: input1=1x3x480x960
[01/07/2022-18:34:48] [I] Iterations: 10
[01/07/2022-18:34:48] [I] Duration: 3s (+ 200ms warm up)
[01/07/2022-18:34:48] [I] Sleep time: 0ms
[01/07/2022-18:34:48] [I] Streams: 1
[01/07/2022-18:34:48] [I] ExposeDMA: Disabled
[01/07/2022-18:34:48] [I] Spin-wait: Disabled
[01/07/2022-18:34:48] [I] Multithreading: Disabled
[01/07/2022-18:34:48] [I] CUDA Graph: Disabled
[01/07/2022-18:34:48] [I] Skip inference: Disabled
[01/07/2022-18:34:48] [I] Inputs:
[01/07/2022-18:34:48] [I] === Reporting Options ===
[01/07/2022-18:34:48] [I] Verbose: Disabled
[01/07/2022-18:34:48] [I] Averages: 10 inferences
[01/07/2022-18:34:48] [I] Percentile: 99
[01/07/2022-18:34:48] [I] Dump output: Disabled
[01/07/2022-18:34:48] [I] Profile: Disabled
[01/07/2022-18:34:48] [I] Export timing to JSON file: 
[01/07/2022-18:34:48] [I] Export output to JSON file: 
[01/07/2022-18:34:48] [I] Export profile to JSON file: 
[01/07/2022-18:34:48] [I] 
----------------------------------------------------------------
Input filename:   /home/delu/deeplearning/yolo3Dv9.1_8.2/scripts/model_200000.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.6
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[01/07/2022-18:34:50] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/07/2022-18:34:52] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[01/07/2022-18:36:34] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[01/07/2022-18:36:38] [I] Starting inference threads
[01/07/2022-18:36:42] [I] Warmup completed 0 queries over 200 ms
[01/07/2022-18:36:42] [I] Timing trace has 0 queries over 3.04169 s
[01/07/2022-18:36:42] [I] Trace averages of 10 runs:
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.3957 ms - Host latency: 16.3063 ms (end to end 26.1012 ms, enqueue 0.883246 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.3967 ms - Host latency: 16.2882 ms (end to end 26.222 ms, enqueue 0.871765 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.4199 ms - Host latency: 16.3301 ms (end to end 26.157 ms, enqueue 0.889816 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.4348 ms - Host latency: 16.3415 ms (end to end 26.1062 ms, enqueue 1.72222 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.4072 ms - Host latency: 16.3325 ms (end to end 25.9674 ms, enqueue 1.99579 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.4105 ms - Host latency: 16.3316 ms (end to end 25.8559 ms, enqueue 1.93925 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.3985 ms - Host latency: 16.3168 ms (end to end 25.8229 ms, enqueue 1.83491 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.39 ms - Host latency: 16.3459 ms (end to end 25.8012 ms, enqueue 2.12814 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.4001 ms - Host latency: 16.3281 ms (end to end 25.8814 ms, enqueue 1.98121 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.3887 ms - Host latency: 16.3 ms (end to end 25.8978 ms, enqueue 1.72109 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.3901 ms - Host latency: 16.3127 ms (end to end 25.9992 ms, enqueue 0.854712 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.3916 ms - Host latency: 16.3064 ms (end to end 25.7594 ms, enqueue 1.07394 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.3988 ms - Host latency: 16.3142 ms (end to end 26.0596 ms, enqueue 0.892187 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.3918 ms - Host latency: 16.2903 ms (end to end 25.9768 ms, enqueue 0.860828 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.3918 ms - Host latency: 16.2917 ms (end to end 26.0154 ms, enqueue 0.869165 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.4003 ms - Host latency: 16.3249 ms (end to end 26.0823 ms, enqueue 0.857227 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.399 ms - Host latency: 16.3136 ms (end to end 26.1556 ms, enqueue 1.20142 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.3944 ms - Host latency: 16.2908 ms (end to end 26.0546 ms, enqueue 1.20337 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.4003 ms - Host latency: 16.3129 ms (end to end 26.1759 ms, enqueue 1.19958 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.3969 ms - Host latency: 16.3257 ms (end to end 26.1223 ms, enqueue 1.28875 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.3968 ms - Host latency: 16.318 ms (end to end 25.78 ms, enqueue 2.04983 ms)
[01/07/2022-18:36:42] [I] Average on 10 runs - GPU latency: 13.3886 ms - Host latency: 16.3038 ms (end to end 25.7963 ms, enqueue 1.97502 ms)
[01/07/2022-18:36:42] [I] Host Latency
[01/07/2022-18:36:42] [I] min: 16.2292 ms (end to end 25.1312 ms)
[01/07/2022-18:36:42] [I] max: 16.6643 ms (end to end 26.4575 ms)
[01/07/2022-18:36:42] [I] mean: 16.3163 ms (end to end 25.9828 ms)
[01/07/2022-18:36:42] [I] median: 16.3057 ms (end to end 25.9833 ms)
[01/07/2022-18:36:42] [I] percentile: 16.4579 ms at 99% (end to end 26.4144 ms at 99%)
[01/07/2022-18:36:42] [I] throughput: 0 qps
[01/07/2022-18:36:42] [I] walltime: 3.04169 s
[01/07/2022-18:36:42] [I] Enqueue Time
[01/07/2022-18:36:42] [I] min: 0.820068 ms
[01/07/2022-18:36:42] [I] max: 3.15674 ms
[01/07/2022-18:36:42] [I] median: 1.19226 ms
[01/07/2022-18:36:42] [I] GPU Compute
[01/07/2022-18:36:42] [I] min: 13.3418 ms
[01/07/2022-18:36:42] [I] max: 13.5137 ms
[01/07/2022-18:36:42] [I] mean: 13.3989 ms
[01/07/2022-18:36:42] [I] median: 13.396 ms
[01/07/2022-18:36:42] [I] percentile: 13.4963 ms at 99%
[01/07/2022-18:36:42] [I] total compute time: 3.02814 s
&&&& PASSED TensorRT.trtexec # ./trtexec --onnx=/home/delu/deeplearning/yolo3Dv9.1_8.2/scripts/model_200000.onnx --shapes=input1:1x3x480x960 --saveEngine=/home/delu/deeplearning/yolo3Dv9.1_8.2/scripts/model_200000.engine

Thank you, but I need to implement my model on a nvidia ECU which has a fixed TensorRT environmnet It’s better not to change the TensorRT version.

Hi,

We recommend you to please share issue repro model and script with us to try from our end for better debugging.

Thank you.