$ ./bin/trtexec --onnx=./data/mnist/mnist.onnx --explicitBatch=1 --dumpProfile &&&& RUNNING TensorRT.trtexec # ./bin/trtexec --onnx=./data/mnist/mnist.onnx --explicitBatch=1 --dumpProfile [04/22/2021-06:47:12] [I] === Model Options === [04/22/2021-06:47:12] [I] Format: ONNX [04/22/2021-06:47:12] [I] Model: ./data/mnist/mnist.onnx [04/22/2021-06:47:12] [I] Output: [04/22/2021-06:47:12] [I] === Build Options === [04/22/2021-06:47:12] [I] Max batch: explicit [04/22/2021-06:47:12] [I] Workspace: 16 MB [04/22/2021-06:47:12] [I] minTiming: 1 [04/22/2021-06:47:12] [I] avgTiming: 8 [04/22/2021-06:47:12] [I] Precision: FP32 [04/22/2021-06:47:12] [I] Calibration: [04/22/2021-06:47:12] [I] Safe mode: Disabled [04/22/2021-06:47:12] [I] Save engine: [04/22/2021-06:47:12] [I] Load engine: [04/22/2021-06:47:12] [I] Inputs format: fp32:CHW [04/22/2021-06:47:12] [I] Outputs format: fp32:CHW [04/22/2021-06:47:12] [I] Input build shapes: model [04/22/2021-06:47:12] [I] === System Options === [04/22/2021-06:47:12] [I] Device: 0 [04/22/2021-06:47:12] [I] DLACore: [04/22/2021-06:47:12] [I] Plugins: [04/22/2021-06:47:12] [I] === Inference Options === [04/22/2021-06:47:12] [I] Batch: 1 [04/22/2021-06:47:12] [I] Input inference shapes: model [04/22/2021-06:47:12] [I] Iterations: 10 (200 ms warm up) [04/22/2021-06:47:12] [I] Duration: 10s [04/22/2021-06:47:12] [I] Sleep time: 0ms [04/22/2021-06:47:12] [I] Streams: 1 [04/22/2021-06:47:12] [I] Spin-wait: Disabled [04/22/2021-06:47:12] [I] Multithreading: Enabled [04/22/2021-06:47:12] [I] CUDA Graph: Disabled [04/22/2021-06:47:12] [I] Skip inference: Disabled [04/22/2021-06:47:12] [I] Consistency: Disabled [04/22/2021-06:47:12] [I] === Reporting Options === [04/22/2021-06:47:12] [I] Verbose: Disabled [04/22/2021-06:47:12] [I] Averages: 10 inferences [04/22/2021-06:47:12] [I] Percentile: 99 [04/22/2021-06:47:12] [I] Dump output: Disabled [04/22/2021-06:47:12] [I] Profile: Enabled [04/22/2021-06:47:12] [I] Export timing to JSON file: [04/22/2021-06:47:12] [I] Export profile to JSON file: [04/22/2021-06:47:12] [I] ---------------------------------------------------------------- Input filename: ./data/mnist/mnist.onnx ONNX IR version: 0.0.3 Opset version: 8 Producer name: CNTK Producer version: 2.5.1 Domain: ai.cntk Model version: 1 Doc string: ---------------------------------------------------------------- [04/22/2021-06:47:12] [W] [TRT] onnx2trt_utils.cpp:194: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [04/22/2021-06:47:12] [W] [TRT] onnx2trt_utils.cpp:194: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [04/22/2021-06:47:14] [I] [TRT] Detected 1 inputs and 1 output network tensors. [04/22/2021-06:47:14] [I] Average over 10 runs is 0.0757248 ms (host walltime is 0.0890465 ms, 99% percentile time is 0.138976). [04/22/2021-06:47:14] [I] Average over 10 runs is 0.0681728 ms (host walltime is 0.0833204 ms, 99% percentile time is 0.070432). [04/22/2021-06:47:14] [I] Average over 10 runs is 0.0726464 ms (host walltime is 0.0862418 ms, 99% percentile time is 0.096256). [04/22/2021-06:47:14] [I] Average over 10 runs is 0.0677056 ms (host walltime is 0.0778302 ms, 99% percentile time is 0.07088). [04/22/2021-06:47:14] [I] Average over 10 runs is 0.071744 ms (host walltime is 0.0854714 ms, 99% percentile time is 0.11056). [04/22/2021-06:47:14] [I] Average over 10 runs is 0.0673472 ms (host walltime is 0.0804563 ms, 99% percentile time is 0.070272). [04/22/2021-06:47:14] [I] Average over 10 runs is 0.067568 ms (host walltime is 0.0775848 ms, 99% percentile time is 0.071232). [04/22/2021-06:47:14] [I] Average over 10 runs is 0.0699552 ms (host walltime is 0.080596 ms, 99% percentile time is 0.09504). [04/22/2021-06:47:14] [I] Average over 10 runs is 0.0675296 ms (host walltime is 0.0775819 ms, 99% percentile time is 0.070528). [04/22/2021-06:47:14] [I] Average over 10 runs is 0.067776 ms (host walltime is 0.078145 ms, 99% percentile time is 0.069984). [04/22/2021-06:47:14] [I] Host wallTime [04/22/2021-06:47:14] [I] min: 0.076922 ms [04/22/2021-06:47:14] [I] max: 0.167616 ms [04/22/2021-06:47:14] [I] median: 0.079709 ms [04/22/2021-06:47:14] [I] GPU compute [04/22/2021-06:47:14] [I] min: 0.067168 ms [04/22/2021-06:47:14] [I] max: 0.138976 ms [04/22/2021-06:47:14] [I] median: 0.06864 ms [04/22/2021-06:47:14] [I] ========== Layer time profile ========== [04/22/2021-06:47:14] [I] TensorRT layer name Runtime, % Invocations Runtime, ms [04/22/2021-06:47:14] [I] (Unnamed Layer* 1) [Shuffle] 11.5% 100 0.64 [04/22/2021-06:47:14] [I] Convolution28 11.1% 100 0.61 [04/22/2021-06:47:14] [I] (Unnamed Layer* 4) [Shuffle] 7.0% 100 0.39 [04/22/2021-06:47:14] [I] (Unnamed Layer* 5) [ElementWise] + ReLU32 8.3% 100 0.46 [04/22/2021-06:47:14] [I] Pooling66 8.3% 100 0.46 [04/22/2021-06:47:14] [I] Convolution110 11.9% 100 0.66 [04/22/2021-06:47:14] [I] (Unnamed Layer* 10) [Shuffle] 7.4% 100 0.41 [04/22/2021-06:47:14] [I] (Unnamed Layer* 11) [ElementWise] + ReLU114 8.3% 100 0.46 [04/22/2021-06:47:14] [I] Pooling160 8.5% 100 0.47 [04/22/2021-06:47:14] [I] Times212 9.9% 100 0.55 [04/22/2021-06:47:14] [I] (Unnamed Layer* 17) [ElementWise] 7.8% 100 0.43 [04/22/2021-06:47:14] [I] ========== Layer time total runtime = 5.5473 ms ==========