Hi AastaLLL,
We try to run trtexec with GPU, commend if follow as:
trtexec --onnx=yolov3_608.onnx --workspace=26 --int8
and result infomation is:
&&&& RUNNING TensorRT.trtexec # trtexec --onnx=/home/nvidia/work/yolov3_onnx/yolov3_608.onnx --workspace=26 --int8
[00/10/2020-15:36:16] [I] === Model Options ===
[00/10/2020-15:36:16] [I] Format: ONNX
[00/10/2020-15:36:16] [I] Model: /home/nvidia/work/yolov3_onnx/yolov3_608.onnx
[00/10/2020-15:36:16] [I] Output:
[00/10/2020-15:36:16] [I] === Build Options ===
[00/10/2020-15:36:16] [I] Max batch: 1
[00/10/2020-15:36:16] [I] Workspace: 26 MB
[00/10/2020-15:36:16] [I] minTiming: 1
[00/10/2020-15:36:16] [I] avgTiming: 8
[00/10/2020-15:36:16] [I] Precision: INT8
[00/10/2020-15:36:16] [I] Calibration: Dynamic
[00/10/2020-15:36:16] [I] Safe mode: Disabled
[00/10/2020-15:36:16] [I] Save engine:
[00/10/2020-15:36:16] [I] Load engine:
[00/10/2020-15:36:16] [I] Inputs format: fp32:CHW
[00/10/2020-15:36:16] [I] Outputs format: fp32:CHW
[00/10/2020-15:36:16] [I] Input build shapes: model
[00/10/2020-15:36:16] [I] === System Options ===
[00/10/2020-15:36:16] [I] Device: 0
[00/10/2020-15:36:16] [I] DLACore:
[00/10/2020-15:36:16] [I] Plugins:
[00/10/2020-15:36:16] [I] === Inference Options ===
[00/10/2020-15:36:16] [I] Batch: 1
[00/10/2020-15:36:16] [I] Iterations: 10 (200 ms warm up)
[00/10/2020-15:36:16] [I] Duration: 10s
[00/10/2020-15:36:16] [I] Sleep time: 0ms
[00/10/2020-15:36:16] [I] Streams: 1
[00/10/2020-15:36:16] [I] Spin-wait: Disabled
[00/10/2020-15:36:16] [I] Multithreading: Enabled
[00/10/2020-15:36:16] [I] CUDA Graph: Disabled
[00/10/2020-15:36:16] [I] Skip inference: Disabled
[00/10/2020-15:36:16] [I] Input inference shapes: model
[00/10/2020-15:36:16] [I] === Reporting Options ===
[00/10/2020-15:36:16] [I] Verbose: Disabled
[00/10/2020-15:36:16] [I] Averages: 10 inferences
[00/10/2020-15:36:16] [I] Percentile: 99
[00/10/2020-15:36:16] [I] Dump output: Disabled
[00/10/2020-15:36:16] [I] Profile: Disabled
[00/10/2020-15:36:16] [I] Export timing to JSON file:
[00/10/2020-15:36:16] [I] Export profile to JSON file:
[00/10/2020-15:36:16] [I]
----------------------------------------------------------------
Input filename: /home/nvidia/work/yolov3_onnx/yolov3_608.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: NVIDIA TensorRT sample
Producer version:
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
[00/10/2020-15:36:18] [W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[00/10/2020-15:36:18] [I] [TRT]
[00/10/2020-15:36:18] [I] [TRT] --------------- Layers running on DLA:
[00/10/2020-15:36:18] [I] [TRT]
[00/10/2020-15:36:18] [I] [TRT] --------------- Layers running on GPU:
[00/10/2020-15:36:18] [I] [TRT] (Unnamed Layer* 0) [Convolution], (Unnamed Layer* 2) [Activation], (Unnamed Layer* 3) [Convolution], (Unnamed Layer* 5) [Activation], (Unnamed Layer* 6) [Convolution], (Unnamed Layer* 8) [Activation], (Unnamed Layer* 9) [Convolution], (Unnamed Layer* 11) [Activation], (Unnamed Layer* 12) [ElementWise], (Unnamed Layer* 13) [Convolution], (Unnamed Layer* 15) [Activation], (Unnamed Layer* 16) [Convolution], (Unnamed Layer* 18) [Activation], (Unnamed Layer* 19) [Convolution], (Unnamed Layer* 21) [Activation], (Unnamed Layer* 22) [ElementWise], (Unnamed Layer* 23) [Convolution], (Unnamed Layer* 25) [Activation], (Unnamed Layer* 26) [Convolution], (Unnamed Layer* 28) [Activation], (Unnamed Layer* 29) [ElementWise], (Unnamed Layer* 30) [Convolution], (Unnamed Layer* 32) [Activation], (Unnamed Layer* 33) [Convolution], (Unnamed Layer* 35) [Activation], (Unnamed Layer* 36) [Convolution], (Unnamed Layer* 38) [Activation], (Unnamed Layer* 39) [ElementWise], (Unnamed Layer* 40) [Convolution], (Unnamed Layer* 42) [Activation], (Unnamed Layer* 43) [Convolution], (Unnamed Layer* 45) [Activation], (Unnamed Layer* 46) [ElementWise], (Unnamed Layer* 47) [Convolution], (Unnamed Layer* 49) [Activation], (Unnamed Layer* 50) [Convolution], (Unnamed Layer* 52) [Activation], (Unnamed Layer* 53) [ElementWise], (Unnamed Layer* 54) [Convolution], (Unnamed Layer* 56) [Activation], (Unnamed Layer* 57) [Convolution], (Unnamed Layer* 59) [Activation], (Unnamed Layer* 60) [ElementWise], (Unnamed Layer* 61) [Convolution], (Unnamed Layer* 63) [Activation], (Unnamed Layer* 64) [Convolution], (Unnamed Layer* 66) [Activation], (Unnamed Layer* 67) [ElementWise], (Unnamed Layer* 68) [Convolution], (Unnamed Layer* 70) [Activation], (Unnamed Layer* 71) [Convolution], (Unnamed Layer* 73) [Activation], (Unnamed Layer* 74) [ElementWise], (Unnamed Layer* 75) [Convolution], (Unnamed Layer* 77) [Activation], (Unnamed Layer* 78) [Convolution], (Unnamed Layer* 80) [Activation], (Unnamed Layer* 81) [ElementWise], (Unnamed Layer* 82) [Convolution], (Unnamed Layer* 84) [Activation], (Unnamed Layer* 85) [Convolution], (Unnamed Layer* 87) [Activation], (Unnamed Layer* 88) [ElementWise], (Unnamed Layer* 89) [Convolution], (Unnamed Layer* 91) [Activation], (Unnamed Layer* 92) [Convolution], (Unnamed Layer* 94) [Activation], (Unnamed Layer* 95) [Convolution], (Unnamed Layer* 97) [Activation], (Unnamed Layer* 98) [ElementWise], (Unnamed Layer* 99) [Convolution], (Unnamed Layer* 101) [Activation], (Unnamed Layer* 102) [Convolution], (Unnamed Layer* 104) [Activation], (Unnamed Layer* 105) [ElementWise], (Unnamed Layer* 106) [Convolution], (Unnamed Layer* 108) [Activation], (Unnamed Layer* 109) [Convolution], (Unnamed Layer* 111) [Activation], (Unnamed Layer* 112) [ElementWise], (Unnamed Layer* 113) [Convolution], (Unnamed Layer* 115) [Activation], (Unnamed Layer* 116) [Convolution], (Unnamed Layer* 118) [Activation], (Unnamed Layer* 119) [ElementWise], (Unnamed Layer* 120) [Convolution], (Unnamed Layer* 122) [Activation], (Unnamed Layer* 123) [Convolution], (Unnamed Layer* 125) [Activation], (Unnamed Layer* 126) [ElementWise], (Unnamed Layer* 127) [Convolution], (Unnamed Layer* 129) [Activation], (Unnamed Layer* 130) [Convolution], (Unnamed Layer* 132) [Activation], (Unnamed Layer* 133) [ElementWise], (Unnamed Layer* 134) [Convolution], (Unnamed Layer* 136) [Activation], (Unnamed Layer* 137) [Convolution], (Unnamed Layer* 139) [Activation], (Unnamed Layer* 140) [ElementWise], (Unnamed Layer* 141) [Convolution], (Unnamed Layer* 143) [Activation], (Unnamed Layer* 144) [Convolution], (Unnamed Layer* 146) [Activation], (Unnamed Layer* 147) [ElementWise], (Unnamed Layer* 148) [Convolution], (Unnamed Layer* 150) [Activation], (Unnamed Layer* 151) [Convolution], (Unnamed Layer* 153) [Activation], (Unnamed Layer* 154) [Convolution], (Unnamed Layer* 156) [Activation], (Unnamed Layer* 157) [ElementWise], (Unnamed Layer* 158) [Convolution], (Unnamed Layer* 160) [Activation], (Unnamed Layer* 161) [Convolution], (Unnamed Layer* 163) [Activation], (Unnamed Layer* 164) [ElementWise], (Unnamed Layer* 165) [Convolution], (Unnamed Layer* 167) [Activation], (Unnamed Layer* 168) [Convolution], (Unnamed Layer* 170) [Activation], (Unnamed Layer* 171) [ElementWise], (Unnamed Layer* 172) [Convolution], (Unnamed Layer* 174) [Activation], (Unnamed Layer* 175) [Convolution], (Unnamed Layer* 177) [Activation], (Unnamed Layer* 178) [ElementWise], (Unnamed Layer* 179) [Convolution], (Unnamed Layer* 181) [Activation], (Unnamed Layer* 182) [Convolution], (Unnamed Layer* 184) [Activation], (Unnamed Layer* 185) [Convolution], (Unnamed Layer* 187) [Activation], (Unnamed Layer* 188) [Convolution], (Unnamed Layer* 190) [Activation], (Unnamed Layer* 191) [Convolution], (Unnamed Layer* 193) [Activation], (Unnamed Layer* 194) [Convolution], (Unnamed Layer* 196) [Activation], (Unnamed Layer* 197) [Convolution], (Unnamed Layer* 198) [Convolution], (Unnamed Layer* 200) [Activation], (Unnamed Layer* 201) [Resize], 086_upsample copy, (Unnamed Layer* 203) [Convolution], (Unnamed Layer* 205) [Activation], (Unnamed Layer* 206) [Convolution], (Unnamed Layer* 208) [Activation], (Unnamed Layer* 209) [Convolution], (Unnamed Layer* 211) [Activation], (Unnamed Layer* 212) [Convolution], (Unnamed Layer* 214) [Activation], (Unnamed Layer* 215) [Convolution], (Unnamed Layer* 217) [Activation], (Unnamed Layer* 218) [Convolution], (Unnamed Layer* 220) [Activation], (Unnamed Layer* 221) [Convolution], (Unnamed Layer* 222) [Convolution], (Unnamed Layer* 224) [Activation], (Unnamed Layer* 225) [Resize], 098_upsample copy, (Unnamed Layer* 227) [Convolution], (Unnamed Layer* 229) [Activation], (Unnamed Layer* 230) [Convolution], (Unnamed Layer* 232) [Activation], (Unnamed Layer* 233) [Convolution], (Unnamed Layer* 235) [Activation], (Unnamed Layer* 236) [Convolution], (Unnamed Layer* 238) [Activation], (Unnamed Layer* 239) [Convolution], (Unnamed Layer* 241) [Activation], (Unnamed Layer* 242) [Convolution], (Unnamed Layer* 244) [Activation], (Unnamed Layer* 245) [Convolution],
[00/10/2020-15:36:21] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[00/10/2020-15:39:46] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[00/10/2020-15:39:47] [I] Average over 10 runs is 25.7592 ms (host walltime is 25.8521 ms, 99% percentile time is 26.4878).
[00/10/2020-15:39:48] [I] Average over 10 runs is 26.8782 ms (host walltime is 27.0092 ms, 99% percentile time is 33.6098).
[00/10/2020-15:39:48] [I] Average over 10 runs is 26.711 ms (host walltime is 26.8539 ms, 99% percentile time is 33.9163).
[00/10/2020-15:39:48] [I] Average over 10 runs is 26.6445 ms (host walltime is 26.7936 ms, 99% percentile time is 32.8204).
[00/10/2020-15:39:48] [I] Average over 10 runs is 26.6838 ms (host walltime is 26.8311 ms, 99% percentile time is 33.1121).
[00/10/2020-15:39:49] [I] Average over 10 runs is 26.7551 ms (host walltime is 26.8875 ms, 99% percentile time is 33.6251).
[00/10/2020-15:39:49] [I] Average over 10 runs is 26.9908 ms (host walltime is 27.1109 ms, 99% percentile time is 32.6678).
[00/10/2020-15:39:49] [I] Average over 10 runs is 26.7439 ms (host walltime is 26.8641 ms, 99% percentile time is 33.1622).
[00/10/2020-15:39:49] [I] Average over 10 runs is 26.9623 ms (host walltime is 27.0811 ms, 99% percentile time is 33.2608).
[00/10/2020-15:39:50] [I] Average over 10 runs is 26.7927 ms (host walltime is 26.8805 ms, 99% percentile time is 33.184).
&&&& PASSED TensorRT.trtexec # trtexec --onnx=/home/nvidia/work/yolov3_onnx/yolov3_608.onnx --workspace=26 --int8
And we try to run trtexec with DLA, commend if follow as:
trtexec --onnx=/home/nvidia/work/yolov3_onnx/yolov3_608.onnx --workspace=26 --useDLACore=1 --int8
&&&& RUNNING TensorRT.trtexec # trtexec --onnx=/home/nvidia/work/yolov3_onnx/yolov3_608.onnx --workspace=26 --useDLACore=1 --int8
[00/10/2020-16:04:24] [I] === Model Options ===
[00/10/2020-16:04:24] [I] Format: ONNX
[00/10/2020-16:04:24] [I] Model: /home/nvidia/work/yolov3_onnx/yolov3_608.onnx
[00/10/2020-16:04:24] [I] Output:
[00/10/2020-16:04:24] [I] === Build Options ===
[00/10/2020-16:04:24] [I] Max batch: 1
[00/10/2020-16:04:24] [I] Workspace: 26 MB
[00/10/2020-16:04:24] [I] minTiming: 1
[00/10/2020-16:04:24] [I] avgTiming: 8
[00/10/2020-16:04:24] [I] Precision: INT8
[00/10/2020-16:04:24] [I] Calibration: Dynamic
[00/10/2020-16:04:24] [I] Safe mode: Disabled
[00/10/2020-16:04:24] [I] Save engine:
[00/10/2020-16:04:24] [I] Load engine:
[00/10/2020-16:04:24] [I] Inputs format: fp32:CHW
[00/10/2020-16:04:24] [I] Outputs format: fp32:CHW
[00/10/2020-16:04:24] [I] Input build shapes: model
[00/10/2020-16:04:24] [I] === System Options ===
[00/10/2020-16:04:24] [I] Device: 0
[00/10/2020-16:04:24] [I] DLACore: 1
[00/10/2020-16:04:24] [I] Plugins:
[00/10/2020-16:04:24] [I] === Inference Options ===
[00/10/2020-16:04:24] [I] Batch: 1
[00/10/2020-16:04:24] [I] Iterations: 10 (200 ms warm up)
[00/10/2020-16:04:24] [I] Duration: 10s
[00/10/2020-16:04:24] [I] Sleep time: 0ms
[00/10/2020-16:04:24] [I] Streams: 1
[00/10/2020-16:04:24] [I] Spin-wait: Disabled
[00/10/2020-16:04:24] [I] Multithreading: Enabled
[00/10/2020-16:04:24] [I] CUDA Graph: Disabled
[00/10/2020-16:04:24] [I] Skip inference: Disabled
[00/10/2020-16:04:24] [I] Input inference shapes: model
[00/10/2020-16:04:24] [I] === Reporting Options ===
[00/10/2020-16:04:24] [I] Verbose: Disabled
[00/10/2020-16:04:24] [I] Averages: 10 inferences
[00/10/2020-16:04:24] [I] Percentile: 99
[00/10/2020-16:04:24] [I] Dump output: Disabled
[00/10/2020-16:04:24] [I] Profile: Disabled
[00/10/2020-16:04:24] [I] Export timing to JSON file:
[00/10/2020-16:04:24] [I] Export profile to JSON file:
[00/10/2020-16:04:24] [I]
----------------------------------------------------------------
Input filename: /home/nvidia/work/yolov3_onnx/yolov3_608.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: NVIDIA TensorRT sample
Producer version:
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
[00/10/2020-16:04:26] [E] [TRT] (Unnamed Layer* 2) [Activation]: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA.
[00/10/2020-16:04:26] [E] [TRT] Default DLA is enabled but layer (Unnamed Layer* 2) [Activation] is not supported on DLA and falling back to GPU is not enabled.
[00/10/2020-16:04:26] [E] Engine could not be created
&&&& FAILED TensorRT.trtexec # trtexec --onnx=/home/nvidia/work/yolov3_onnx/yolov3_608.onnx --workspace=26 --useDLACore=1 --int8
Some layer is not supported on DLA, sush as LEAKY_RELU, but this document report have support LEAKY_RELU on link: https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html
How solve this issue?