tensorRT inference unstable compared onnxruntime

Description

Hi
I use PyTorch Image Models timm model (resnest14d) to convert onnx model and tensorRT-engine, bellow is my step and result:

  1. define resnest14d model in pytorch and set pretrained=True, load a image to test and get a output tensor [1,1000]. use function ‘torch.max’ to get max valuse is 8.284667, class index is 207.

  2. convert this pytorch model to onnx model successfully, and inference by onnx-runtime get max valuse is 8.284672, class index is 207, this is great because result is very close.

  3. convert this onnx model to trt engine unsuccessfully
    Python or C++ (trtexec) have the same ERROR:

RROR: Failed to parse the ONNX file.
In node 39 (importGlobalAveragePool): UNSUPPORTED_NODE: Assertion failed: !isDynamic(kernelSize) && “Cannot run global average pool on an input with dynamic spatial dimensions!”

  1. but tensorRT 7.0 supported global average pool → support-matrix
    so i try to simplify onnx model by this package → onnx-simplifier and convert trt engine again → successfully!

command: trtexec --onnx=resnest14d_s.onnx --saveEengine=resnest14d_s.trt --fp16 --explicitBatch

  1. I inference a image by this trt engine get difference result (repeat step4).
    sometimes i get a very close my idea result (max value:8.278888, class:207), but i don’t know why, that let me very confused

I hit a snag with it, have any recommend?
thank you!

Environment

TensorRT Version: 7.0.0.11
GPU Type: GTX 2070 8GB
Nvidia Driver Version: 460.73.01
CUDA Version: 10.0
CUDNN Version: 7.6.5
Operating System + Version: ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.3.1
Baremetal or Container (if container which image + tag):

Relevant Files

all py, onnx,trt file in here
google drive

trtexec

co mmand: trtexec --onnx=resnest14d_s.onnx --saveEengine=resnest14d_s.trt --fp16 --explicitBatch
&&&& RUNNING TensorRT.trtexec # /media/hsien/tensorRT/install/TensorRT-7.0.0.11/bin/trtexec --onnx=/media/hsien/tensorRT/resnest14d_s.onnx --saveEngine=/media/hsien/tensorRT/resnet14d_s.trt --fp16 --explicitBatch
[04/28/2021-16:27:23] [I] === Model Options ===
[04/28/2021-16:27:23] [I] Format: ONNX
[04/28/2021-16:27:23] [I] Model: /media/hsien/tensorRT/resnest14d_s.onnx
[04/28/2021-16:27:23] [I] Output:
[04/28/2021-16:27:23] [I] === Build Options ===
[04/28/2021-16:27:23] [I] Max batch: explicit
[04/28/2021-16:27:23] [I] Workspace: 16 MB
[04/28/2021-16:27:23] [I] minTiming: 1
[04/28/2021-16:27:23] [I] avgTiming: 8
[04/28/2021-16:27:23] [I] Precision: FP16
[04/28/2021-16:27:23] [I] Calibration:
[04/28/2021-16:27:23] [I] Safe mode: Disabled
[04/28/2021-16:27:23] [I] Save engine: /media/hsien/tensorRT/resnet14d_s.trt
[04/28/2021-16:27:23] [I] Load engine:
[04/28/2021-16:27:23] [I] Inputs format: fp32:CHW
[04/28/2021-16:27:23] [I] Outputs format: fp32:CHW
[04/28/2021-16:27:23] [I] Input build shapes: model
[04/28/2021-16:27:23] [I] === System Options ===
[04/28/2021-16:27:23] [I] Device: 0
[04/28/2021-16:27:23] [I] DLACore:
[04/28/2021-16:27:23] [I] Plugins:
[04/28/2021-16:27:23] [I] === Inference Options ===
[04/28/2021-16:27:23] [I] Batch: Explicit
[04/28/2021-16:27:23] [I] Iterations: 10
[04/28/2021-16:27:23] [I] Duration: 3s (+ 200ms warm up)
[04/28/2021-16:27:23] [I] Sleep time: 0ms
[04/28/2021-16:27:23] [I] Streams: 1
[04/28/2021-16:27:23] [I] ExposeDMA: Disabled
[04/28/2021-16:27:23] [I] Spin-wait: Disabled
[04/28/2021-16:27:23] [I] Multithreading: Disabled
[04/28/2021-16:27:23] [I] CUDA Graph: Disabled
[04/28/2021-16:27:23] [I] Skip inference: Disabled
[04/28/2021-16:27:23] [I] Inputs:
[04/28/2021-16:27:23] [I] === Reporting Options ===
[04/28/2021-16:27:23] [I] Verbose: Disabled
[04/28/2021-16:27:23] [I] Averages: 10 inferences
[04/28/2021-16:27:23] [I] Percentile: 99
[04/28/2021-16:27:23] [I] Dump output: Disabled
[04/28/2021-16:27:23] [I] Profile: Disabled
[04/28/2021-16:27:23] [I] Export timing to JSON file:
[04/28/2021-16:27:23] [I] Export output to JSON file:
[04/28/2021-16:27:23] [I] Export profile to JSON file:
[04/28/2021-16:27:23] [I]
----------------------------------------------------------------
Input filename: /media/hsien/tensorRT/resnest14d_s.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: pytorch
Producer version: 1.3
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
[04/28/2021-16:27:25] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[04/28/2021-16:27:48] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[04/28/2021-16:27:48] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[04/28/2021-16:27:51] [I] Warmup completed 0 queries over 200 ms
[04/28/2021-16:27:51] [I] Timing trace has 0 queries over 3.0137 s
[04/28/2021-16:27:51] [I] Trace averages of 10 runs:
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 5.01885 ms - Host latency: 5.15036 ms (end to end 9.79618 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.56866 ms - Host latency: 4.6776 ms (end to end 8.71171 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.60496 ms - Host latency: 4.71108 ms (end to end 9.05927 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.73467 ms - Host latency: 4.85085 ms (end to end 9.3425 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.68143 ms - Host latency: 4.80391 ms (end to end 9.09819 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.7803 ms - Host latency: 4.86614 ms (end to end 9.33675 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.53147 ms - Host latency: 4.60175 ms (end to end 8.83579 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.65472 ms - Host latency: 4.75435 ms (end to end 9.12504 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.49711 ms - Host latency: 4.57805 ms (end to end 8.46733 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.38116 ms - Host latency: 4.45335 ms (end to end 8.56692 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22346 ms - Host latency: 4.29333 ms (end to end 8.24498 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18707 ms - Host latency: 4.26896 ms (end to end 8.21243 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18265 ms - Host latency: 4.28536 ms (end to end 7.89745 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21424 ms - Host latency: 4.29061 ms (end to end 8.2432 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21951 ms - Host latency: 4.35684 ms (end to end 8.20419 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.20006 ms - Host latency: 4.33803 ms (end to end 8.24309 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.30435 ms - Host latency: 4.44499 ms (end to end 8.35511 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21157 ms - Host latency: 4.32996 ms (end to end 8.28438 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.2169 ms - Host latency: 4.30469 ms (end to end 8.23054 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18958 ms - Host latency: 4.2589 ms (end to end 8.24014 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.20555 ms - Host latency: 4.2755 ms (end to end 8.22488 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19561 ms - Host latency: 4.26597 ms (end to end 8.21317 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21375 ms - Host latency: 4.28456 ms (end to end 8.24779 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18981 ms - Host latency: 4.31053 ms (end to end 8.19866 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22507 ms - Host latency: 4.33169 ms (end to end 8.26803 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18712 ms - Host latency: 4.25585 ms (end to end 7.9356 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.28182 ms - Host latency: 4.35649 ms (end to end 8.06406 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.50676 ms - Host latency: 4.62809 ms (end to end 8.84784 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19648 ms - Host latency: 4.30699 ms (end to end 8.25563 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22816 ms - Host latency: 4.30033 ms (end to end 8.2545 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.54932 ms - Host latency: 4.62684 ms (end to end 8.95779 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.61113 ms - Host latency: 4.68608 ms (end to end 9.05883 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.68635 ms - Host latency: 4.77278 ms (end to end 9.12974 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.46609 ms - Host latency: 4.5587 ms (end to end 8.83785 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.20592 ms - Host latency: 4.27491 ms (end to end 8.31342 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.61998 ms - Host latency: 4.70216 ms (end to end 9.012 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.53343 ms - Host latency: 4.6228 ms (end to end 8.99227 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.40498 ms - Host latency: 4.50452 ms (end to end 8.55096 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.32549 ms - Host latency: 4.39685 ms (end to end 8.44919 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18707 ms - Host latency: 4.27657 ms (end to end 8.1975 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21287 ms - Host latency: 4.28912 ms (end to end 8.25288 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19008 ms - Host latency: 4.26011 ms (end to end 8.19014 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21372 ms - Host latency: 4.28585 ms (end to end 8.24791 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.20935 ms - Host latency: 4.27932 ms (end to end 8.21533 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.20618 ms - Host latency: 4.27688 ms (end to end 8.24875 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22021 ms - Host latency: 4.29072 ms (end to end 8.2259 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19929 ms - Host latency: 4.26938 ms (end to end 8.25813 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22664 ms - Host latency: 4.3499 ms (end to end 8.23345 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18787 ms - Host latency: 4.25745 ms (end to end 8.23013 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.2114 ms - Host latency: 4.30042 ms (end to end 8.20288 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19229 ms - Host latency: 4.26489 ms (end to end 8.06702 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21621 ms - Host latency: 4.31855 ms (end to end 8.22141 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18652 ms - Host latency: 4.32917 ms (end to end 8.20715 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22429 ms - Host latency: 4.36472 ms (end to end 8.2668 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19597 ms - Host latency: 4.33701 ms (end to end 8.20193 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.23828 ms - Host latency: 4.32671 ms (end to end 8.29736 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19995 ms - Host latency: 4.26958 ms (end to end 8.20959 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21943 ms - Host latency: 4.29695 ms (end to end 8.2655 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19236 ms - Host latency: 4.26213 ms (end to end 8.20923 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.23306 ms - Host latency: 4.33921 ms (end to end 8.28213 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21404 ms - Host latency: 4.35471 ms (end to end 8.21985 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.34009 ms - Host latency: 4.44229 ms (end to end 8.51135 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.23359 ms - Host latency: 4.36431 ms (end to end 8.23723 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18853 ms - Host latency: 4.33606 ms (end to end 8.20444 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22261 ms - Host latency: 4.30564 ms (end to end 8.27012 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19524 ms - Host latency: 4.2656 ms (end to end 8.2126 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21357 ms - Host latency: 4.29067 ms (end to end 8.22153 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19138 ms - Host latency: 4.33374 ms (end to end 8.21853 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21252 ms - Host latency: 4.35369 ms (end to end 8.23562 ms)
[04/28/2021-16:27:51] [I] Host latency
[04/28/2021-16:27:51] [I] min: 4.15906 ms (end to end 5.35564 ms)
[04/28/2021-16:27:51] [I] max: 6.05525 ms (end to end 10.7888 ms)
[04/28/2021-16:27:51] [I] mean: 4.41058 ms (end to end 8.4237 ms)
[04/28/2021-16:27:51] [I] median: 4.26581 ms (end to end 8.3382 ms)
[04/28/2021-16:27:51] [I] percentile: 5.41858 ms at 99% (end to end 10.1203 ms at 99%)
[04/28/2021-16:27:51] [I] throughput: 0 qps
[04/28/2021-16:27:51] [I] walltime: 3.0137 s
[04/28/2021-16:27:51] [I] GPU Compute
[04/28/2021-16:27:51] [I] min: 4.08984 ms
[04/28/2021-16:27:51] [I] max: 5.91461 ms
[04/28/2021-16:27:51] [I] mean: 4.31508 ms
[04/28/2021-16:27:51] [I] median: 4.14777 ms
[04/28/2021-16:27:51] [I] percentile: 5.27188 ms at 99%
[04/28/2021-16:27:51] [I] total compute time: 3.0033 s
&&&& PASSED TensorRT.trtexec # /media/hsien/tensorRT/install/TensorRT-7.0.0.11/bin/trtexec --onnx=/media/hsien/tensorRT/resnest14d_s.onnx --saveEngine=/media/hsien/tensorRT/resnet14d_s.trt --fp16 --explicitBatch

Hi @qe9031211,
Could you please try with latest TRT release (TRT 7.2.3) with latest CUDA ?
May be you can try NGC container
https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/running.html

Please let us know if issue persist in latest release as well.

Thanks

thanks you for your reply.

I use tensorRT 7.2.3 container tensorRT container

convert onnx file to trt engine sucessfully and no warring! (don’t need onnx-simplifier)

however i try three times (onnx → trt → inference)
inference a image and use torch.max get max value:
8.2890625 / 8.28125 / 8.2734375 (difference), class index: 207 (the same)

so, i try to inference by onnx-runtime get max value:
8.27716 / 8.27716 / 8.27716 (stable), class index: 207

The latest version 7.2.3 is relatively stable!!
but this result is right? There will be Slightly different in every conversion.

sorry for late response and thanks for your help.

The variation seem to be very minor ~0.01.
I would recommend you to create a TRT engine file and serialize it. So that for every inference you can reuse the same engine by de-serializing the TRT engine file.

Thanks

1 Like

I appreciate your help very much!

1 Like