Description
Hi
I use PyTorch Image Models timm model (resnest14d) to convert onnx model and tensorRT-engine, bellow is my step and result:
-
define resnest14d model in pytorch and set pretrained=True, load a image to test and get a output tensor [1,1000]. use function ‘torch.max’ to get max valuse is 8.284667, class index is 207.
-
convert this pytorch model to onnx model successfully, and inference by onnx-runtime get max valuse is 8.284672, class index is 207, this is great because result is very close.
-
convert this onnx model to trt engine unsuccessfully
Python or C++ (trtexec) have the same ERROR:
RROR: Failed to parse the ONNX file.
In node 39 (importGlobalAveragePool): UNSUPPORTED_NODE: Assertion failed: !isDynamic(kernelSize) && “Cannot run global average pool on an input with dynamic spatial dimensions!”
- but tensorRT 7.0 supported global average pool → support-matrix
so i try to simplify onnx model by this package → onnx-simplifier and convert trt engine again → successfully!
command: trtexec --onnx=resnest14d_s.onnx --saveEengine=resnest14d_s.trt --fp16 --explicitBatch
- I inference a image by this trt engine get difference result (repeat step4).
sometimes i get a very close my idea result (max value:8.278888, class:207), but i don’t know why, that let me very confused
I hit a snag with it, have any recommend?
thank you!
Environment
TensorRT Version: 7.0.0.11
GPU Type: GTX 2070 8GB
Nvidia Driver Version: 460.73.01
CUDA Version: 10.0
CUDNN Version: 7.6.5
Operating System + Version: ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.3.1
Baremetal or Container (if container which image + tag):
Relevant Files
all py, onnx,trt file in here
google drive
trtexec
co mmand: trtexec --onnx=resnest14d_s.onnx --saveEengine=resnest14d_s.trt --fp16 --explicitBatch
&&&& RUNNING TensorRT.trtexec # /media/hsien/tensorRT/install/TensorRT-7.0.0.11/bin/trtexec --onnx=/media/hsien/tensorRT/resnest14d_s.onnx --saveEngine=/media/hsien/tensorRT/resnet14d_s.trt --fp16 --explicitBatch
[04/28/2021-16:27:23] [I] === Model Options ===
[04/28/2021-16:27:23] [I] Format: ONNX
[04/28/2021-16:27:23] [I] Model: /media/hsien/tensorRT/resnest14d_s.onnx
[04/28/2021-16:27:23] [I] Output:
[04/28/2021-16:27:23] [I] === Build Options ===
[04/28/2021-16:27:23] [I] Max batch: explicit
[04/28/2021-16:27:23] [I] Workspace: 16 MB
[04/28/2021-16:27:23] [I] minTiming: 1
[04/28/2021-16:27:23] [I] avgTiming: 8
[04/28/2021-16:27:23] [I] Precision: FP16
[04/28/2021-16:27:23] [I] Calibration:
[04/28/2021-16:27:23] [I] Safe mode: Disabled
[04/28/2021-16:27:23] [I] Save engine: /media/hsien/tensorRT/resnet14d_s.trt
[04/28/2021-16:27:23] [I] Load engine:
[04/28/2021-16:27:23] [I] Inputs format: fp32:CHW
[04/28/2021-16:27:23] [I] Outputs format: fp32:CHW
[04/28/2021-16:27:23] [I] Input build shapes: model
[04/28/2021-16:27:23] [I] === System Options ===
[04/28/2021-16:27:23] [I] Device: 0
[04/28/2021-16:27:23] [I] DLACore:
[04/28/2021-16:27:23] [I] Plugins:
[04/28/2021-16:27:23] [I] === Inference Options ===
[04/28/2021-16:27:23] [I] Batch: Explicit
[04/28/2021-16:27:23] [I] Iterations: 10
[04/28/2021-16:27:23] [I] Duration: 3s (+ 200ms warm up)
[04/28/2021-16:27:23] [I] Sleep time: 0ms
[04/28/2021-16:27:23] [I] Streams: 1
[04/28/2021-16:27:23] [I] ExposeDMA: Disabled
[04/28/2021-16:27:23] [I] Spin-wait: Disabled
[04/28/2021-16:27:23] [I] Multithreading: Disabled
[04/28/2021-16:27:23] [I] CUDA Graph: Disabled
[04/28/2021-16:27:23] [I] Skip inference: Disabled
[04/28/2021-16:27:23] [I] Inputs:
[04/28/2021-16:27:23] [I] === Reporting Options ===
[04/28/2021-16:27:23] [I] Verbose: Disabled
[04/28/2021-16:27:23] [I] Averages: 10 inferences
[04/28/2021-16:27:23] [I] Percentile: 99
[04/28/2021-16:27:23] [I] Dump output: Disabled
[04/28/2021-16:27:23] [I] Profile: Disabled
[04/28/2021-16:27:23] [I] Export timing to JSON file:
[04/28/2021-16:27:23] [I] Export output to JSON file:
[04/28/2021-16:27:23] [I] Export profile to JSON file:
[04/28/2021-16:27:23] [I]
----------------------------------------------------------------
Input filename: /media/hsien/tensorRT/resnest14d_s.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: pytorch
Producer version: 1.3
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-16:27:23] [W] [TRT] Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
[04/28/2021-16:27:25] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[04/28/2021-16:27:48] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[04/28/2021-16:27:48] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[04/28/2021-16:27:51] [I] Warmup completed 0 queries over 200 ms
[04/28/2021-16:27:51] [I] Timing trace has 0 queries over 3.0137 s
[04/28/2021-16:27:51] [I] Trace averages of 10 runs:
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 5.01885 ms - Host latency: 5.15036 ms (end to end 9.79618 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.56866 ms - Host latency: 4.6776 ms (end to end 8.71171 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.60496 ms - Host latency: 4.71108 ms (end to end 9.05927 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.73467 ms - Host latency: 4.85085 ms (end to end 9.3425 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.68143 ms - Host latency: 4.80391 ms (end to end 9.09819 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.7803 ms - Host latency: 4.86614 ms (end to end 9.33675 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.53147 ms - Host latency: 4.60175 ms (end to end 8.83579 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.65472 ms - Host latency: 4.75435 ms (end to end 9.12504 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.49711 ms - Host latency: 4.57805 ms (end to end 8.46733 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.38116 ms - Host latency: 4.45335 ms (end to end 8.56692 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22346 ms - Host latency: 4.29333 ms (end to end 8.24498 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18707 ms - Host latency: 4.26896 ms (end to end 8.21243 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18265 ms - Host latency: 4.28536 ms (end to end 7.89745 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21424 ms - Host latency: 4.29061 ms (end to end 8.2432 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21951 ms - Host latency: 4.35684 ms (end to end 8.20419 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.20006 ms - Host latency: 4.33803 ms (end to end 8.24309 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.30435 ms - Host latency: 4.44499 ms (end to end 8.35511 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21157 ms - Host latency: 4.32996 ms (end to end 8.28438 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.2169 ms - Host latency: 4.30469 ms (end to end 8.23054 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18958 ms - Host latency: 4.2589 ms (end to end 8.24014 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.20555 ms - Host latency: 4.2755 ms (end to end 8.22488 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19561 ms - Host latency: 4.26597 ms (end to end 8.21317 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21375 ms - Host latency: 4.28456 ms (end to end 8.24779 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18981 ms - Host latency: 4.31053 ms (end to end 8.19866 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22507 ms - Host latency: 4.33169 ms (end to end 8.26803 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18712 ms - Host latency: 4.25585 ms (end to end 7.9356 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.28182 ms - Host latency: 4.35649 ms (end to end 8.06406 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.50676 ms - Host latency: 4.62809 ms (end to end 8.84784 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19648 ms - Host latency: 4.30699 ms (end to end 8.25563 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22816 ms - Host latency: 4.30033 ms (end to end 8.2545 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.54932 ms - Host latency: 4.62684 ms (end to end 8.95779 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.61113 ms - Host latency: 4.68608 ms (end to end 9.05883 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.68635 ms - Host latency: 4.77278 ms (end to end 9.12974 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.46609 ms - Host latency: 4.5587 ms (end to end 8.83785 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.20592 ms - Host latency: 4.27491 ms (end to end 8.31342 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.61998 ms - Host latency: 4.70216 ms (end to end 9.012 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.53343 ms - Host latency: 4.6228 ms (end to end 8.99227 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.40498 ms - Host latency: 4.50452 ms (end to end 8.55096 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.32549 ms - Host latency: 4.39685 ms (end to end 8.44919 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18707 ms - Host latency: 4.27657 ms (end to end 8.1975 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21287 ms - Host latency: 4.28912 ms (end to end 8.25288 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19008 ms - Host latency: 4.26011 ms (end to end 8.19014 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21372 ms - Host latency: 4.28585 ms (end to end 8.24791 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.20935 ms - Host latency: 4.27932 ms (end to end 8.21533 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.20618 ms - Host latency: 4.27688 ms (end to end 8.24875 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22021 ms - Host latency: 4.29072 ms (end to end 8.2259 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19929 ms - Host latency: 4.26938 ms (end to end 8.25813 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22664 ms - Host latency: 4.3499 ms (end to end 8.23345 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18787 ms - Host latency: 4.25745 ms (end to end 8.23013 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.2114 ms - Host latency: 4.30042 ms (end to end 8.20288 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19229 ms - Host latency: 4.26489 ms (end to end 8.06702 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21621 ms - Host latency: 4.31855 ms (end to end 8.22141 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18652 ms - Host latency: 4.32917 ms (end to end 8.20715 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22429 ms - Host latency: 4.36472 ms (end to end 8.2668 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19597 ms - Host latency: 4.33701 ms (end to end 8.20193 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.23828 ms - Host latency: 4.32671 ms (end to end 8.29736 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19995 ms - Host latency: 4.26958 ms (end to end 8.20959 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21943 ms - Host latency: 4.29695 ms (end to end 8.2655 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19236 ms - Host latency: 4.26213 ms (end to end 8.20923 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.23306 ms - Host latency: 4.33921 ms (end to end 8.28213 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21404 ms - Host latency: 4.35471 ms (end to end 8.21985 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.34009 ms - Host latency: 4.44229 ms (end to end 8.51135 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.23359 ms - Host latency: 4.36431 ms (end to end 8.23723 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.18853 ms - Host latency: 4.33606 ms (end to end 8.20444 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.22261 ms - Host latency: 4.30564 ms (end to end 8.27012 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19524 ms - Host latency: 4.2656 ms (end to end 8.2126 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21357 ms - Host latency: 4.29067 ms (end to end 8.22153 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.19138 ms - Host latency: 4.33374 ms (end to end 8.21853 ms)
[04/28/2021-16:27:51] [I] Average on 10 runs - GPU latency: 4.21252 ms - Host latency: 4.35369 ms (end to end 8.23562 ms)
[04/28/2021-16:27:51] [I] Host latency
[04/28/2021-16:27:51] [I] min: 4.15906 ms (end to end 5.35564 ms)
[04/28/2021-16:27:51] [I] max: 6.05525 ms (end to end 10.7888 ms)
[04/28/2021-16:27:51] [I] mean: 4.41058 ms (end to end 8.4237 ms)
[04/28/2021-16:27:51] [I] median: 4.26581 ms (end to end 8.3382 ms)
[04/28/2021-16:27:51] [I] percentile: 5.41858 ms at 99% (end to end 10.1203 ms at 99%)
[04/28/2021-16:27:51] [I] throughput: 0 qps
[04/28/2021-16:27:51] [I] walltime: 3.0137 s
[04/28/2021-16:27:51] [I] GPU Compute
[04/28/2021-16:27:51] [I] min: 4.08984 ms
[04/28/2021-16:27:51] [I] max: 5.91461 ms
[04/28/2021-16:27:51] [I] mean: 4.31508 ms
[04/28/2021-16:27:51] [I] median: 4.14777 ms
[04/28/2021-16:27:51] [I] percentile: 5.27188 ms at 99%
[04/28/2021-16:27:51] [I] total compute time: 3.0033 s
&&&& PASSED TensorRT.trtexec # /media/hsien/tensorRT/install/TensorRT-7.0.0.11/bin/trtexec --onnx=/media/hsien/tensorRT/resnest14d_s.onnx --saveEngine=/media/hsien/tensorRT/resnet14d_s.trt --fp16 --explicitBatch