PyTorch model loosing accuracy when converting to TensorRT

Description

I’m trying to convert a PyTorch model into TensorRT to run on a Jetson Nano however my model massively loses quality compared to the original model.

The original model is a slightly adapted version of pasqualedems excellent crowd counting model. From this, I used a 540x960 model instead of the standard 1080x1960 model as my computer did not have enough GPU memory to convert the 1080x1960 model into an onnx file.

Regardless the 540x960 still works very well as shown by the following image which is a heatmap of the model overlaying the original image.

The onnx file was created using:

model.eval()
x = torch.randn(1, 3, 540, 960, device=‘cuda’)
input_names = [ “actual_input_1” ] + [ “learned_%d” % i for i in range(16) ]
output_names = [ “output1” ]
torch.onnx.export(model, x, “DroneCrowd11-550x960.onnx”, verbose=True, input_names=input_names, output_names=output_names, opset_version=11)

and returns this from trtexec

&&&& RUNNING TensorRT.trtexec # ./trtexec --onnx=DroneCrowd11-550x960.onnx --shapes=input:1x3x550x960
[07/12/2021-12:05:53] [I] === Model Options ===
[07/12/2021-12:05:53] [I] Format: ONNX
[07/12/2021-12:05:53] [I] Model: DroneCrowd11-550x960.onnx
[07/12/2021-12:05:53] [I] Output:
[07/12/2021-12:05:53] [I] === Build Options ===
[07/12/2021-12:05:53] [I] Max batch: explicit
[07/12/2021-12:05:53] [I] Workspace: 16 MB
[07/12/2021-12:05:53] [I] minTiming: 1
[07/12/2021-12:05:53] [I] avgTiming: 8
[07/12/2021-12:05:53] [I] Precision: FP32
[07/12/2021-12:05:53] [I] Calibration:
[07/12/2021-12:05:53] [I] Safe mode: Disabled
[07/12/2021-12:05:53] [I] Save engine:
[07/12/2021-12:05:53] [I] Load engine:
[07/12/2021-12:05:53] [I] Builder Cache: Enabled
[07/12/2021-12:05:53] [I] NVTX verbosity: 0
[07/12/2021-12:05:53] [I] Inputs format: fp32:CHW
[07/12/2021-12:05:53] [I] Outputs format: fp32:CHW
[07/12/2021-12:05:53] [I] Input build shape: input=1x3x550x960+1x3x550x960+1x3x550x960
[07/12/2021-12:05:53] [I] Input calibration shapes: model
[07/12/2021-12:05:53] [I] === System Options ===
[07/12/2021-12:05:53] [I] Device: 0
[07/12/2021-12:05:53] [I] DLACore:
[07/12/2021-12:05:53] [I] Plugins:
[07/12/2021-12:05:53] [I] === Inference Options ===
[07/12/2021-12:05:53] [I] Batch: Explicit
[07/12/2021-12:05:53] [I] Input inference shape: input=1x3x550x960
[07/12/2021-12:05:53] [I] Iterations: 10
[07/12/2021-12:05:53] [I] Duration: 3s (+ 200ms warm up)
[07/12/2021-12:05:53] [I] Sleep time: 0ms
[07/12/2021-12:05:53] [I] Streams: 1
[07/12/2021-12:05:53] [I] ExposeDMA: Disabled
[07/12/2021-12:05:53] [I] Spin-wait: Disabled
[07/12/2021-12:05:53] [I] Multithreading: Disabled
[07/12/2021-12:05:53] [I] CUDA Graph: Disabled
[07/12/2021-12:05:53] [I] Skip inference: Disabled
[07/12/2021-12:05:53] [I] Inputs:
[07/12/2021-12:05:53] [I] === Reporting Options ===
[07/12/2021-12:05:53] [I] Verbose: Disabled
[07/12/2021-12:05:53] [I] Averages: 10 inferences
[07/12/2021-12:05:53] [I] Percentile: 99
[07/12/2021-12:05:53] [I] Dump output: Disabled
[07/12/2021-12:05:53] [I] Profile: Disabled
[07/12/2021-12:05:53] [I] Export timing to JSON file:
[07/12/2021-12:05:53] [I] Export output to JSON file:
[07/12/2021-12:05:53] [I] Export profile to JSON file:
[07/12/2021-12:05:53] [I]
.----------------------------------------------------------------
Input filename: DroneCrowd11-550x960.onnx
ONNX IR version: 0.0.6
Opset version: 11
Producer name: pytorch
Producer version: 1.8
Domain:
Model version: 0
Doc string:
.----------------------------------------------------------------
[07/12/2021-12:05:57] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[07/12/2021-12:06:51] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[07/12/2021-12:11:02] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[07/12/2021-12:11:03] [I] Starting inference threads
[07/12/2021-12:11:06] [I] Warmup completed 0 queries over 200 ms
[07/12/2021-12:11:06] [I] Timing trace has 0 queries over 3.14148 s
[07/12/2021-12:11:06] [I] Trace averages of 10 runs:
[07/12/2021-12:11:06] [I] Average on 10 runs - GPU latency: 49.8122 ms - Host latency: 50.6448 ms (end to end 50.6581 ms, enqueue 3.42705 ms)
[07/12/2021-12:11:06] [I] Average on 10 runs - GPU latency: 49.8618 ms - Host latency: 50.6932 ms (end to end 50.7064 ms, enqueue 3.45831 ms)
[07/12/2021-12:11:06] [I] Average on 10 runs - GPU latency: 49.7726 ms - Host latency: 50.6025 ms (end to end 50.6155 ms, enqueue 3.403 ms)
[07/12/2021-12:11:06] [I] Average on 10 runs - GPU latency: 49.8534 ms - Host latency: 50.6886 ms (end to end 50.7018 ms, enqueue 3.35613 ms)
[07/12/2021-12:11:06] [I] Average on 10 runs - GPU latency: 49.7855 ms - Host latency: 50.6194 ms (end to end 50.6327 ms, enqueue 3.45796 ms)
[07/12/2021-12:11:06] [I] Average on 10 runs - GPU latency: 49.8102 ms - Host latency: 50.6415 ms (end to end 50.6546 ms, enqueue 3.37891 ms)
[07/12/2021-12:11:06] [I] Host Latency
[07/12/2021-12:11:06] [I] min: 50.4688 ms (end to end 50.481 ms)
[07/12/2021-12:11:06] [I] max: 51.1907 ms (end to end 51.2039 ms)
[07/12/2021-12:11:06] [I] mean: 50.6553 ms (end to end 50.6684 ms)
[07/12/2021-12:11:06] [I] median: 50.636 ms (end to end 50.6492 ms)
[07/12/2021-12:11:06] [I] percentile: 51.1907 ms at 99% (end to end 51.2039 ms at 99%)
[07/12/2021-12:11:06] [I] throughput: 0 qps
[07/12/2021-12:11:06] [I] walltime: 3.14148 s
[07/12/2021-12:11:06] [I] Enqueue Time
[07/12/2021-12:11:06] [I] min: 3.20786 ms
[07/12/2021-12:11:06] [I] max: 3.78125 ms
[07/12/2021-12:11:06] [I] median: 3.40656 ms
[07/12/2021-12:11:06] [I] GPU Compute
[07/12/2021-12:11:06] [I] min: 49.634 ms
[07/12/2021-12:11:06] [I] max: 50.3528 ms
[07/12/2021-12:11:06] [I] mean: 49.8229 ms
[07/12/2021-12:11:06] [I] median: 49.8029 ms
[07/12/2021-12:11:06] [I] percentile: 50.3528 ms at 99%
[07/12/2021-12:11:06] [I] total compute time: 3.08902 s
&&&& PASSED TensorRT.trtexec # ./trtexec --onnx=DroneCrowd11-550x960.onnx --shapes=input:1x3x550x960

The following files are what I am using to run the model on the jetson nano:
Python script = deepstream-crowd-detection.py (11.4 KB)
Spec file = crowd_detector.txt (3.0 KB)
Onnx model = DroneCrowd11-540x960.onnx (3.3 MB)
Engine file = DroneCrowd11-540x960.onnx_b1_gpu0_fp16.engine (7.0 MB)
Description of use = README (1.5 KB)

Currently, I have it set to produce a txt file of the NumPy array containing a 540x960 array of the output of one frame and it produces an expected output format. However, when this array is run through the heatmap generation it shows that it has become highly inaccurate.
On the same image, as shown above, the tensorRT model produces this:


Which misses most of the pedestrians the original model was detecting

Any advice on what I can change to make this model closer to the PyTorch model would be greatly appreciated.

Thanks

Environment

TensorRT Version: 7.1.3-1
GPU Type: NVIDIA Jetson-Nano 4GB
Nvidia Driver Version: Flashed with jetson-nano-jp451-sd-card-image
CUDA Version: 10.2
CUDNN Version: 8.0.0.180-1
Operating System + Version: ubuntu 18.04

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#onnx-export

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hi,

The information asked for was provided in my original post, please can you advise me on how I can improve the tensorRT model.

Thanks,

Hi @LwsChlds,

Please allow us some time to try from our end, meanwhile could you please confirm when you run inference on ONNX runtime, are you getting same accuracy as pytorch model and not facing accuracy drop issue ?

Thank you.

Hi,

Unfortunately I haven’t been able to get an output prediction from onnx runtime that can be used to produce a heat map so cannot verify if the model works as expected.

Do you think the issue could be from with how I created the onnx model?

Thanks

@LwsChlds,

It is also possible that after converting to ONNX model, we get difference in accuracy based on the layers optimization. So better confirm that using ONNX runtime are you getting correct accuracy,
We recommend you to validate ONNX runtime output as well.

Also could you share output logs using --verbose flag in trtexec command you’re running.

Thank you.

Hi,

I have tested the onnx file on onnx-runtime and it produces the same outputs as the original model.
I fixed the initial problem I was having with onnx-runtime by using the same image pre-processing that the original PyTorch model uses.

Could the image pre-processing on tensorRT also be the cause of the problem?

I have attached the trtexec --verbose output as requested:
trtexec-verbose-output.txt (601.7 KB)

Thanks for your assistance.

@LwsChlds,

Looks like you’re using Deepstream.
Please post your concern on Deepstream forum to get better help.

Thank you.

@LwsChlds

I recently made a post on the forums about a similar issue: Pytorch to Onnx to TRT: Unstable Output when running TRTExec

When I exported from pytorch to onnx to TRT, I observe that the output of my model is unstable, and the output confidence of my model tends to drop by 5-10%.

In your model, how different is the output of TRT from the output of the pytorch model on a specific frame. If they are very similar (just lower confidence), you may be hitting the same issue I have been facing

I am also using CUDA 10.2, TensorRT 7.1.3 and I am running on a Jetson Xavier

Hi @VivekKrishnan

The output of TRT is quite different from the pytorch/ONNX runtime models but I believe that it might be due to the different image pre-processing being run as I don’t know what values to use for the net-scale-factor or offsets.

Did your model run the same on ONNX runtime from Pytorch?
Also what pre-processing are you doing in Pytorch and on TRT?

Hey there, I was running my model in TRT directly (loading from Onnx), not the onnx runtime. I was doing some image preprocessing but I was able to validate that the preprocessor was the same between Pytorch and Onnx.