Description
I’m trying to convert a PyTorch model into TensorRT to run on a Jetson Nano however my model massively loses quality compared to the original model.
The original model is a slightly adapted version of pasqualedems excellent crowd counting model. From this, I used a 540x960 model instead of the standard 1080x1960 model as my computer did not have enough GPU memory to convert the 1080x1960 model into an onnx file.
Regardless the 540x960 still works very well as shown by the following image which is a heatmap of the model overlaying the original image.
The onnx file was created using:
model.eval()
x = torch.randn(1, 3, 540, 960, device=‘cuda’)
input_names = [ “actual_input_1” ] + [ “learned_%d” % i for i in range(16) ]
output_names = [ “output1” ]
torch.onnx.export(model, x, “DroneCrowd11-550x960.onnx”, verbose=True, input_names=input_names, output_names=output_names, opset_version=11)
and returns this from trtexec
&&&& RUNNING TensorRT.trtexec # ./trtexec --onnx=DroneCrowd11-550x960.onnx --shapes=input:1x3x550x960
[07/12/2021-12:05:53] [I] === Model Options ===
[07/12/2021-12:05:53] [I] Format: ONNX
[07/12/2021-12:05:53] [I] Model: DroneCrowd11-550x960.onnx
[07/12/2021-12:05:53] [I] Output:
[07/12/2021-12:05:53] [I] === Build Options ===
[07/12/2021-12:05:53] [I] Max batch: explicit
[07/12/2021-12:05:53] [I] Workspace: 16 MB
[07/12/2021-12:05:53] [I] minTiming: 1
[07/12/2021-12:05:53] [I] avgTiming: 8
[07/12/2021-12:05:53] [I] Precision: FP32
[07/12/2021-12:05:53] [I] Calibration:
[07/12/2021-12:05:53] [I] Safe mode: Disabled
[07/12/2021-12:05:53] [I] Save engine:
[07/12/2021-12:05:53] [I] Load engine:
[07/12/2021-12:05:53] [I] Builder Cache: Enabled
[07/12/2021-12:05:53] [I] NVTX verbosity: 0
[07/12/2021-12:05:53] [I] Inputs format: fp32:CHW
[07/12/2021-12:05:53] [I] Outputs format: fp32:CHW
[07/12/2021-12:05:53] [I] Input build shape: input=1x3x550x960+1x3x550x960+1x3x550x960
[07/12/2021-12:05:53] [I] Input calibration shapes: model
[07/12/2021-12:05:53] [I] === System Options ===
[07/12/2021-12:05:53] [I] Device: 0
[07/12/2021-12:05:53] [I] DLACore:
[07/12/2021-12:05:53] [I] Plugins:
[07/12/2021-12:05:53] [I] === Inference Options ===
[07/12/2021-12:05:53] [I] Batch: Explicit
[07/12/2021-12:05:53] [I] Input inference shape: input=1x3x550x960
[07/12/2021-12:05:53] [I] Iterations: 10
[07/12/2021-12:05:53] [I] Duration: 3s (+ 200ms warm up)
[07/12/2021-12:05:53] [I] Sleep time: 0ms
[07/12/2021-12:05:53] [I] Streams: 1
[07/12/2021-12:05:53] [I] ExposeDMA: Disabled
[07/12/2021-12:05:53] [I] Spin-wait: Disabled
[07/12/2021-12:05:53] [I] Multithreading: Disabled
[07/12/2021-12:05:53] [I] CUDA Graph: Disabled
[07/12/2021-12:05:53] [I] Skip inference: Disabled
[07/12/2021-12:05:53] [I] Inputs:
[07/12/2021-12:05:53] [I] === Reporting Options ===
[07/12/2021-12:05:53] [I] Verbose: Disabled
[07/12/2021-12:05:53] [I] Averages: 10 inferences
[07/12/2021-12:05:53] [I] Percentile: 99
[07/12/2021-12:05:53] [I] Dump output: Disabled
[07/12/2021-12:05:53] [I] Profile: Disabled
[07/12/2021-12:05:53] [I] Export timing to JSON file:
[07/12/2021-12:05:53] [I] Export output to JSON file:
[07/12/2021-12:05:53] [I] Export profile to JSON file:
[07/12/2021-12:05:53] [I]
.----------------------------------------------------------------
Input filename: DroneCrowd11-550x960.onnx
ONNX IR version: 0.0.6
Opset version: 11
Producer name: pytorch
Producer version: 1.8
Domain:
Model version: 0
Doc string:
.----------------------------------------------------------------
[07/12/2021-12:05:57] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[07/12/2021-12:06:51] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[07/12/2021-12:11:02] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[07/12/2021-12:11:03] [I] Starting inference threads
[07/12/2021-12:11:06] [I] Warmup completed 0 queries over 200 ms
[07/12/2021-12:11:06] [I] Timing trace has 0 queries over 3.14148 s
[07/12/2021-12:11:06] [I] Trace averages of 10 runs:
[07/12/2021-12:11:06] [I] Average on 10 runs - GPU latency: 49.8122 ms - Host latency: 50.6448 ms (end to end 50.6581 ms, enqueue 3.42705 ms)
[07/12/2021-12:11:06] [I] Average on 10 runs - GPU latency: 49.8618 ms - Host latency: 50.6932 ms (end to end 50.7064 ms, enqueue 3.45831 ms)
[07/12/2021-12:11:06] [I] Average on 10 runs - GPU latency: 49.7726 ms - Host latency: 50.6025 ms (end to end 50.6155 ms, enqueue 3.403 ms)
[07/12/2021-12:11:06] [I] Average on 10 runs - GPU latency: 49.8534 ms - Host latency: 50.6886 ms (end to end 50.7018 ms, enqueue 3.35613 ms)
[07/12/2021-12:11:06] [I] Average on 10 runs - GPU latency: 49.7855 ms - Host latency: 50.6194 ms (end to end 50.6327 ms, enqueue 3.45796 ms)
[07/12/2021-12:11:06] [I] Average on 10 runs - GPU latency: 49.8102 ms - Host latency: 50.6415 ms (end to end 50.6546 ms, enqueue 3.37891 ms)
[07/12/2021-12:11:06] [I] Host Latency
[07/12/2021-12:11:06] [I] min: 50.4688 ms (end to end 50.481 ms)
[07/12/2021-12:11:06] [I] max: 51.1907 ms (end to end 51.2039 ms)
[07/12/2021-12:11:06] [I] mean: 50.6553 ms (end to end 50.6684 ms)
[07/12/2021-12:11:06] [I] median: 50.636 ms (end to end 50.6492 ms)
[07/12/2021-12:11:06] [I] percentile: 51.1907 ms at 99% (end to end 51.2039 ms at 99%)
[07/12/2021-12:11:06] [I] throughput: 0 qps
[07/12/2021-12:11:06] [I] walltime: 3.14148 s
[07/12/2021-12:11:06] [I] Enqueue Time
[07/12/2021-12:11:06] [I] min: 3.20786 ms
[07/12/2021-12:11:06] [I] max: 3.78125 ms
[07/12/2021-12:11:06] [I] median: 3.40656 ms
[07/12/2021-12:11:06] [I] GPU Compute
[07/12/2021-12:11:06] [I] min: 49.634 ms
[07/12/2021-12:11:06] [I] max: 50.3528 ms
[07/12/2021-12:11:06] [I] mean: 49.8229 ms
[07/12/2021-12:11:06] [I] median: 49.8029 ms
[07/12/2021-12:11:06] [I] percentile: 50.3528 ms at 99%
[07/12/2021-12:11:06] [I] total compute time: 3.08902 s
&&&& PASSED TensorRT.trtexec # ./trtexec --onnx=DroneCrowd11-550x960.onnx --shapes=input:1x3x550x960
The following files are what I am using to run the model on the jetson nano:
Python script = deepstream-crowd-detection.py (11.4 KB)
Spec file = crowd_detector.txt (3.0 KB)
Onnx model = DroneCrowd11-540x960.onnx (3.3 MB)
Engine file = DroneCrowd11-540x960.onnx_b1_gpu0_fp16.engine (7.0 MB)
Description of use = README (1.5 KB)
Currently, I have it set to produce a txt file of the NumPy array containing a 540x960 array of the output of one frame and it produces an expected output format. However, when this array is run through the heatmap generation it shows that it has become highly inaccurate.
On the same image, as shown above, the tensorRT model produces this:
Which misses most of the pedestrians the original model was detecting
Any advice on what I can change to make this model closer to the PyTorch model would be greatly appreciated.
Thanks
Environment
TensorRT Version: 7.1.3-1
GPU Type: NVIDIA Jetson-Nano 4GB
Nvidia Driver Version: Flashed with jetson-nano-jp451-sd-card-image
CUDA Version: 10.2
CUDNN Version: 8.0.0.180-1
Operating System + Version: ubuntu 18.04