&&&& RUNNING TensorRT.trtexec # trtexec --onnx=softmax_test.onnx --verbose --dumpOutput --batch=1 --safe [09/07/2019-06:41:05] [I] === Model Options === [09/07/2019-06:41:05] [I] Format: ONNX [09/07/2019-06:41:05] [I] Model: softmax_test.onnx [09/07/2019-06:41:05] [I] Output: [09/07/2019-06:41:05] [I] === Build Options === [09/07/2019-06:41:05] [I] Max batch: 1 [09/07/2019-06:41:05] [I] Workspace: 16 MB [09/07/2019-06:41:05] [I] minTiming: 1 [09/07/2019-06:41:05] [I] avgTiming: 8 [09/07/2019-06:41:05] [I] Precision: FP32 [09/07/2019-06:41:05] [I] Calibration: [09/07/2019-06:41:05] [I] Safe mode: Enabled [09/07/2019-06:41:05] [I] Save engine: [09/07/2019-06:41:05] [I] Load engine: [09/07/2019-06:41:05] [I] Inputs format: fp32:CHW [09/07/2019-06:41:05] [I] Outputs format: fp32:CHW [09/07/2019-06:41:05] [I] Input build shapes: model [09/07/2019-06:41:05] [I] === System Options === [09/07/2019-06:41:05] [I] Device: 0 [09/07/2019-06:41:05] [I] DLACore: [09/07/2019-06:41:05] [I] Plugins: [09/07/2019-06:41:05] [I] === Inference Options === [09/07/2019-06:41:05] [I] Batch: 1 [09/07/2019-06:41:05] [I] Iterations: 10 (200 ms warm up) [09/07/2019-06:41:05] [I] Duration: 10s [09/07/2019-06:41:05] [I] Sleep time: 0ms [09/07/2019-06:41:05] [I] Streams: 1 [09/07/2019-06:41:05] [I] Spin-wait: Disabled [09/07/2019-06:41:05] [I] Multithreading: Enabled [09/07/2019-06:41:05] [I] CUDA Graph: Disabled [09/07/2019-06:41:05] [I] Skip inference: Disabled [09/07/2019-06:41:05] [I] Input inference shapes: model [09/07/2019-06:41:05] [I] === Reporting Options === [09/07/2019-06:41:05] [I] Verbose: Enabled [09/07/2019-06:41:05] [I] Averages: 10 inferences [09/07/2019-06:41:05] [I] Percentile: 99 [09/07/2019-06:41:05] [I] Dump output: Enabled [09/07/2019-06:41:05] [I] Profile: Disabled [09/07/2019-06:41:05] [I] Export timing to JSON file: [09/07/2019-06:41:05] [I] Export profile to JSON file: [09/07/2019-06:41:05] [I] [09/07/2019-06:41:05] [V] [TRT] Plugin Creator registration succeeded - GridAnchor_TRT [09/07/2019-06:41:05] [V] [TRT] Plugin Creator registration succeeded - NMS_TRT [09/07/2019-06:41:05] [V] [TRT] Plugin Creator registration succeeded - Reorg_TRT [09/07/2019-06:41:05] [V] [TRT] Plugin Creator registration succeeded - Region_TRT [09/07/2019-06:41:05] [V] [TRT] Plugin Creator registration succeeded - Clip_TRT [09/07/2019-06:41:05] [V] [TRT] Plugin Creator registration succeeded - LReLU_TRT [09/07/2019-06:41:05] [V] [TRT] Plugin Creator registration succeeded - PriorBox_TRT [09/07/2019-06:41:05] [V] [TRT] Plugin Creator registration succeeded - Normalize_TRT [09/07/2019-06:41:05] [V] [TRT] Plugin Creator registration succeeded - RPROI_TRT [09/07/2019-06:41:05] [V] [TRT] Plugin Creator registration succeeded - BatchedNMS_TRT [09/07/2019-06:41:05] [V] [TRT] Plugin Creator registration succeeded - FlattenConcat_TRT ---------------------------------------------------------------- Input filename: softmax_test.onnx ONNX IR version: 0.0.4 Opset version: 9 Producer name: MACNICA Producer version: 0.1 Domain: Model version: 0 Doc string: ---------------------------------------------------------------- [09/07/2019-06:41:05] [V] [TRT] Output:Softmax -> (1, 3, 4, 5) ----- Parsing of ONNX model softmax_test.onnx is Done ---- [09/07/2019-06:41:05] [V] [TRT] Applying generic optimizations to the graph for inference. [09/07/2019-06:41:05] [V] [TRT] Original: 1 layers [09/07/2019-06:41:05] [V] [TRT] After dead-layer removal: 1 layers [09/07/2019-06:41:05] [V] [TRT] After scale fusion: 1 layers [09/07/2019-06:41:05] [V] [TRT] After vertical fusions: 1 layers [09/07/2019-06:41:05] [V] [TRT] After final dead-layer removal: 1 layers [09/07/2019-06:41:05] [V] [TRT] After tensor merging: 1 layers [09/07/2019-06:41:05] [V] [TRT] After concat removal: 1 layers [09/07/2019-06:41:05] [V] [TRT] Graph construction and optimization completed in 0.000168271 seconds. [09/07/2019-06:41:06] [V] [TRT] Constructing optimization profile number 0 out of 1 *************** Autotuning format combination: Float(1,5,20,60,60) -> Float(1,5,20,60,60) *************** [09/07/2019-06:41:06] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 0) [Softmax] (SoftMax) [09/07/2019-06:41:06] [V] [TRT] Tactic: 1001 time 0.007168 [09/07/2019-06:41:06] [V] [TRT] Fastest Tactic: 1001 Time: 0.007168 [09/07/2019-06:41:06] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: SoftMax Tactic: 1001 [09/07/2019-06:41:06] [V] [TRT] [09/07/2019-06:41:06] [V] [TRT] Formats and tactics selection completed in 0.00268491 seconds. [09/07/2019-06:41:06] [V] [TRT] After reformat layers: 1 layers [09/07/2019-06:41:06] [V] [TRT] Block size 16777216 [09/07/2019-06:41:06] [V] [TRT] Total Activation Memory: 16777216 [09/07/2019-06:41:06] [I] [TRT] Detected 1 inputs and 1 output network tensors. [09/07/2019-06:41:06] [V] [TRT] Engine generation completed in 1.42884 seconds. [09/07/2019-06:41:06] [V] [TRT] Engine Layer Information: [09/07/2019-06:41:06] [V] [TRT] Layer: (Unnamed Layer* 0) [Softmax] (SoftMax), Tactic: 1001, Input[Float(1,3,4,5)] -> Output[Float(1,3,4,5)] [09/07/2019-06:41:06] [I] Average over 10 runs is 0.0096256 ms (host walltime is 0.0693703 ms, 99% percentile time is 0.021504). [09/07/2019-06:41:06] [I] Average over 10 runs is 0.007984 ms (host walltime is 0.0652598 ms, 99% percentile time is 0.01024). [09/07/2019-06:41:06] [I] Average over 10 runs is 0.0081888 ms (host walltime is 0.0651534 ms, 99% percentile time is 0.011232). [09/07/2019-06:41:06] [I] Average over 10 runs is 0.007984 ms (host walltime is 0.0650814 ms, 99% percentile time is 0.009184). [09/07/2019-06:41:06] [I] Average over 10 runs is 0.009728 ms (host walltime is 0.0694965 ms, 99% percentile time is 0.022528). [09/07/2019-06:41:06] [I] Average over 10 runs is 0.0079872 ms (host walltime is 0.0616338 ms, 99% percentile time is 0.01024). [09/07/2019-06:41:06] [I] Average over 10 runs is 0.0079872 ms (host walltime is 0.05462 ms, 99% percentile time is 0.01024). [09/07/2019-06:41:06] [I] Average over 10 runs is 0.0084 ms (host walltime is 0.0539838 ms, 99% percentile time is 0.01024). [09/07/2019-06:41:06] [I] Average over 10 runs is 0.0084992 ms (host walltime is 0.0544292 ms, 99% percentile time is 0.01024). [09/07/2019-06:41:06] [I] Average over 10 runs is 0.0080896 ms (host walltime is 0.0488964 ms, 99% percentile time is 0.009216). [09/07/2019-06:41:06] [I] Dumping output tensor Output: [09/07/2019-06:41:06] [I] [1, 1, 3, 4, 5] [09/07/2019-06:41:06] [I] 0.333333 0.333333 0.333333 0.333333 0.333333 [09/07/2019-06:41:06] [I] 0.333333 0.333333 0.333333 0.333333 0.333333 [09/07/2019-06:41:06] [I] 0.333333 0.333333 0.333333 0.333333 0.333333 [09/07/2019-06:41:06] [I] 0.333333 0.333333 0.333333 0.333333 0.333333 [09/07/2019-06:41:06] [I] 0.333333 0.333333 0.333333 0.333333 0.333333 [09/07/2019-06:41:06] [I] 0.333333 0.333333 0.333333 0.333333 0.333333 [09/07/2019-06:41:06] [I] 0.333333 0.333333 0.333333 0.333333 0.333333 [09/07/2019-06:41:06] [I] 0.333333 0.333333 0.333333 0.333333 0.333333 [09/07/2019-06:41:06] [I] 0.333333 0.333333 0.333333 0.333333 0.333333 [09/07/2019-06:41:06] [I] 0.333333 0.333333 0.333333 0.333333 0.333333 [09/07/2019-06:41:06] [I] 0.333333 0.333333 0.333333 0.333333 0.333333 [09/07/2019-06:41:06] [I] 0.333333 0.333333 0.333333 0.333333 0.333333 &&&& PASSED TensorRT.trtexec # trtexec --onnx=softmax_test.onnx --verbose --dumpOutput --batch=1 --safe