$ ./bin/trtexec --onnx=./data/mnist/mnist.onnx --explicitBatch=1 --dumpProfile
&&&& RUNNING TensorRT.trtexec # ./bin/trtexec --onnx=./data/mnist/mnist.onnx --explicitBatch=1 --dumpProfile
[04/22/2021-06:47:12] [I] === Model Options ===
[04/22/2021-06:47:12] [I] Format: ONNX
[04/22/2021-06:47:12] [I] Model: ./data/mnist/mnist.onnx
[04/22/2021-06:47:12] [I] Output:
[04/22/2021-06:47:12] [I] === Build Options ===
[04/22/2021-06:47:12] [I] Max batch: explicit
[04/22/2021-06:47:12] [I] Workspace: 16 MB
[04/22/2021-06:47:12] [I] minTiming: 1
[04/22/2021-06:47:12] [I] avgTiming: 8
[04/22/2021-06:47:12] [I] Precision: FP32
[04/22/2021-06:47:12] [I] Calibration: 
[04/22/2021-06:47:12] [I] Safe mode: Disabled
[04/22/2021-06:47:12] [I] Save engine: 
[04/22/2021-06:47:12] [I] Load engine: 
[04/22/2021-06:47:12] [I] Inputs format: fp32:CHW
[04/22/2021-06:47:12] [I] Outputs format: fp32:CHW
[04/22/2021-06:47:12] [I] Input build shapes: model
[04/22/2021-06:47:12] [I] === System Options ===
[04/22/2021-06:47:12] [I] Device: 0
[04/22/2021-06:47:12] [I] DLACore: 
[04/22/2021-06:47:12] [I] Plugins:
[04/22/2021-06:47:12] [I] === Inference Options ===
[04/22/2021-06:47:12] [I] Batch: 1
[04/22/2021-06:47:12] [I] Input inference shapes: model
[04/22/2021-06:47:12] [I] Iterations: 10 (200 ms warm up)
[04/22/2021-06:47:12] [I] Duration: 10s
[04/22/2021-06:47:12] [I] Sleep time: 0ms
[04/22/2021-06:47:12] [I] Streams: 1
[04/22/2021-06:47:12] [I] Spin-wait: Disabled
[04/22/2021-06:47:12] [I] Multithreading: Enabled
[04/22/2021-06:47:12] [I] CUDA Graph: Disabled
[04/22/2021-06:47:12] [I] Skip inference: Disabled
[04/22/2021-06:47:12] [I] Consistency: Disabled
[04/22/2021-06:47:12] [I] === Reporting Options ===
[04/22/2021-06:47:12] [I] Verbose: Disabled
[04/22/2021-06:47:12] [I] Averages: 10 inferences
[04/22/2021-06:47:12] [I] Percentile: 99
[04/22/2021-06:47:12] [I] Dump output: Disabled
[04/22/2021-06:47:12] [I] Profile: Enabled
[04/22/2021-06:47:12] [I] Export timing to JSON file: 
[04/22/2021-06:47:12] [I] Export profile to JSON file: 
[04/22/2021-06:47:12] [I] 
----------------------------------------------------------------
Input filename:   ./data/mnist/mnist.onnx
ONNX IR version:  0.0.3
Opset version:    8
Producer name:    CNTK
Producer version: 2.5.1
Domain:           ai.cntk
Model version:    1
Doc string:       
----------------------------------------------------------------
[04/22/2021-06:47:12] [W] [TRT] onnx2trt_utils.cpp:194: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/22/2021-06:47:12] [W] [TRT] onnx2trt_utils.cpp:194: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/22/2021-06:47:14] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[04/22/2021-06:47:14] [I] Average over 10 runs is 0.0757248 ms (host walltime is 0.0890465 ms, 99% percentile time is 0.138976).
[04/22/2021-06:47:14] [I] Average over 10 runs is 0.0681728 ms (host walltime is 0.0833204 ms, 99% percentile time is 0.070432).
[04/22/2021-06:47:14] [I] Average over 10 runs is 0.0726464 ms (host walltime is 0.0862418 ms, 99% percentile time is 0.096256).
[04/22/2021-06:47:14] [I] Average over 10 runs is 0.0677056 ms (host walltime is 0.0778302 ms, 99% percentile time is 0.07088).
[04/22/2021-06:47:14] [I] Average over 10 runs is 0.071744 ms (host walltime is 0.0854714 ms, 99% percentile time is 0.11056).
[04/22/2021-06:47:14] [I] Average over 10 runs is 0.0673472 ms (host walltime is 0.0804563 ms, 99% percentile time is 0.070272).
[04/22/2021-06:47:14] [I] Average over 10 runs is 0.067568 ms (host walltime is 0.0775848 ms, 99% percentile time is 0.071232).
[04/22/2021-06:47:14] [I] Average over 10 runs is 0.0699552 ms (host walltime is 0.080596 ms, 99% percentile time is 0.09504).
[04/22/2021-06:47:14] [I] Average over 10 runs is 0.0675296 ms (host walltime is 0.0775819 ms, 99% percentile time is 0.070528).
[04/22/2021-06:47:14] [I] Average over 10 runs is 0.067776 ms (host walltime is 0.078145 ms, 99% percentile time is 0.069984).
[04/22/2021-06:47:14] [I] Host wallTime
[04/22/2021-06:47:14] [I] min: 0.076922 ms 
[04/22/2021-06:47:14] [I] max: 0.167616 ms 
[04/22/2021-06:47:14] [I] median: 0.079709 ms 
[04/22/2021-06:47:14] [I] GPU compute
[04/22/2021-06:47:14] [I] min: 0.067168 ms 
[04/22/2021-06:47:14] [I] max: 0.138976 ms 
[04/22/2021-06:47:14] [I] median: 0.06864 ms 
[04/22/2021-06:47:14] [I] ========== Layer time profile ==========
[04/22/2021-06:47:14] [I]                                                    TensorRT layer name    Runtime, %  Invocations  Runtime, ms
[04/22/2021-06:47:14] [I]                                           (Unnamed Layer* 1) [Shuffle]         11.5%          100         0.64
[04/22/2021-06:47:14] [I]                                                          Convolution28         11.1%          100         0.61
[04/22/2021-06:47:14] [I]                                           (Unnamed Layer* 4) [Shuffle]          7.0%          100         0.39
[04/22/2021-06:47:14] [I]                              (Unnamed Layer* 5) [ElementWise] + ReLU32          8.3%          100         0.46
[04/22/2021-06:47:14] [I]                                                              Pooling66          8.3%          100         0.46
[04/22/2021-06:47:14] [I]                                                         Convolution110         11.9%          100         0.66
[04/22/2021-06:47:14] [I]                                          (Unnamed Layer* 10) [Shuffle]          7.4%          100         0.41
[04/22/2021-06:47:14] [I]                            (Unnamed Layer* 11) [ElementWise] + ReLU114          8.3%          100         0.46
[04/22/2021-06:47:14] [I]                                                             Pooling160          8.5%          100         0.47
[04/22/2021-06:47:14] [I]                                                               Times212          9.9%          100         0.55
[04/22/2021-06:47:14] [I]                                      (Unnamed Layer* 17) [ElementWise]          7.8%          100         0.43
[04/22/2021-06:47:14] [I] ========== Layer time total runtime = 5.5473 ms ==========