time /usr/src/tensorrt/bin/trtexec --onnx=googlenet-12.onnx --saveEngine=googlenet-12_fp32.engine &&&& RUNNING TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --onnx=googlenet-12.onnx --saveEngine=googlenet-12_fp32.engine [10/05/2022-19:46:25] [I] === Model Options === [10/05/2022-19:46:25] [I] Format: ONNX [10/05/2022-19:46:25] [I] Model: googlenet-12.onnx [10/05/2022-19:46:25] [I] Output: [10/05/2022-19:46:25] [I] === Build Options === [10/05/2022-19:46:25] [I] Max batch: explicit batch [10/05/2022-19:46:25] [I] Workspace: 16 MiB [10/05/2022-19:46:25] [I] minTiming: 1 [10/05/2022-19:46:25] [I] avgTiming: 8 [10/05/2022-19:46:25] [I] Precision: FP32 [10/05/2022-19:46:25] [I] Calibration: [10/05/2022-19:46:25] [I] Refit: Disabled [10/05/2022-19:46:25] [I] Sparsity: Disabled [10/05/2022-19:46:25] [I] Safe mode: Disabled [10/05/2022-19:46:25] [I] DirectIO mode: Disabled [10/05/2022-19:46:25] [I] Restricted mode: Disabled [10/05/2022-19:46:25] [I] Save engine: googlenet-12_fp32.engine [10/05/2022-19:46:25] [I] Load engine: [10/05/2022-19:46:25] [I] Profiling verbosity: 0 [10/05/2022-19:46:25] [I] Tactic sources: Using default tactic sources [10/05/2022-19:46:25] [I] timingCacheMode: local [10/05/2022-19:46:25] [I] timingCacheFile: [10/05/2022-19:46:25] [I] Input(s)s format: fp32:CHW [10/05/2022-19:46:25] [I] Output(s)s format: fp32:CHW [10/05/2022-19:46:25] [I] Input build shapes: model [10/05/2022-19:46:25] [I] Input calibration shapes: model [10/05/2022-19:46:25] [I] === System Options === [10/05/2022-19:46:25] [I] Device: 0 [10/05/2022-19:46:25] [I] DLACore: [10/05/2022-19:46:25] [I] Plugins: [10/05/2022-19:46:25] [I] === Inference Options === [10/05/2022-19:46:25] [I] Batch: Explicit [10/05/2022-19:46:25] [I] Input inference shapes: model [10/05/2022-19:46:25] [I] Iterations: 10 [10/05/2022-19:46:25] [I] Duration: 3s (+ 200ms warm up) [10/05/2022-19:46:25] [I] Sleep time: 0ms [10/05/2022-19:46:25] [I] Idle time: 0ms [10/05/2022-19:46:25] [I] Streams: 1 [10/05/2022-19:46:25] [I] ExposeDMA: Disabled [10/05/2022-19:46:25] [I] Data transfers: Enabled [10/05/2022-19:46:25] [I] Spin-wait: Disabled [10/05/2022-19:46:25] [I] Multithreading: Disabled [10/05/2022-19:46:25] [I] CUDA Graph: Disabled [10/05/2022-19:46:25] [I] Separate profiling: Disabled [10/05/2022-19:46:25] [I] Time Deserialize: Disabled [10/05/2022-19:46:25] [I] Time Refit: Disabled [10/05/2022-19:46:25] [I] Skip inference: Disabled [10/05/2022-19:46:25] [I] Inputs: [10/05/2022-19:46:25] [I] === Reporting Options === [10/05/2022-19:46:25] [I] Verbose: Disabled [10/05/2022-19:46:25] [I] Averages: 10 inferences [10/05/2022-19:46:25] [I] Percentile: 99 [10/05/2022-19:46:25] [I] Dump refittable layers:Disabled [10/05/2022-19:46:25] [I] Dump output: Disabled [10/05/2022-19:46:25] [I] Profile: Disabled [10/05/2022-19:46:25] [I] Export timing to JSON file: [10/05/2022-19:46:25] [I] Export output to JSON file: [10/05/2022-19:46:25] [I] Export profile to JSON file: [10/05/2022-19:46:25] [I] [10/05/2022-19:46:25] [I] === Device Information === [10/05/2022-19:46:25] [I] Selected Device: NVIDIA Tegra X1 [10/05/2022-19:46:25] [I] Compute Capability: 5.3 [10/05/2022-19:46:25] [I] SMs: 1 [10/05/2022-19:46:25] [I] Compute Clock Rate: 0.9216 GHz [10/05/2022-19:46:25] [I] Device Global Memory: 3964 MiB [10/05/2022-19:46:25] [I] Shared Memory per SM: 64 KiB [10/05/2022-19:46:25] [I] Memory Bus Width: 64 bits (ECC disabled) [10/05/2022-19:46:25] [I] Memory Clock Rate: 0.01275 GHz [10/05/2022-19:46:25] [I] [10/05/2022-19:46:25] [I] TensorRT version: 8.2.1 [10/05/2022-19:46:27] [I] [TRT] [MemUsageChange] Init CUDA: CPU +229, GPU +0, now: CPU 248, GPU 2055 (MiB) [10/05/2022-19:46:27] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 248 MiB, GPU 2084 MiB [10/05/2022-19:46:28] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 277 MiB, GPU 2117 MiB [10/05/2022-19:46:28] [I] Start parsing network model [10/05/2022-19:46:28] [I] [TRT] ---------------------------------------------------------------- [10/05/2022-19:46:28] [I] [TRT] Input filename: googlenet-12.onnx [10/05/2022-19:46:28] [I] [TRT] ONNX IR version: 0.0.7 [10/05/2022-19:46:28] [I] [TRT] Opset version: 12 [10/05/2022-19:46:28] [I] [TRT] Producer name: onnx-caffe2 [10/05/2022-19:46:28] [I] [TRT] Producer version: [10/05/2022-19:46:28] [I] [TRT] Domain: [10/05/2022-19:46:28] [I] [TRT] Model version: 0 [10/05/2022-19:46:28] [I] [TRT] Doc string: [10/05/2022-19:46:28] [I] [TRT] ---------------------------------------------------------------- [10/05/2022-19:46:28] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [10/05/2022-19:46:28] [I] Finish parsing network model [10/05/2022-19:46:28] [I] [TRT] ---------- Layers Running on DLA ---------- [10/05/2022-19:46:28] [I] [TRT] ---------- Layers Running on GPU ---------- [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_conv1/7x7_s2_1 + node_of_conv1/7x7_s2_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_pool1/3x3_s2_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_pool1/norm1_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_conv2/3x3_reduce_1 + node_of_conv2/3x3_reduce_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_conv2/3x3_1 + node_of_conv2/3x3_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_conv2/norm2_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_pool2/3x3_s2_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_3a/1x1_1 + node_of_inception_3a/1x1_2 || node_of_inception_3a/3x3_reduce_1 + node_of_inception_3a/3x3_reduce_2 || node_of_inception_3a/5x5_reduce_1 + node_of_inception_3a/5x5_reduce_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_3a/3x3_1 + node_of_inception_3a/3x3_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_3a/5x5_1 + node_of_inception_3a/5x5_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_3a/pool_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_3a/pool_proj_1 + node_of_inception_3a/pool_proj_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] inception_3a/1x1_2 copy [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_3b/1x1_1 + node_of_inception_3b/1x1_2 || node_of_inception_3b/3x3_reduce_1 + node_of_inception_3b/3x3_reduce_2 || node_of_inception_3b/5x5_reduce_1 + node_of_inception_3b/5x5_reduce_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_3b/3x3_1 + node_of_inception_3b/3x3_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_3b/5x5_1 + node_of_inception_3b/5x5_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_3b/pool_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_3b/pool_proj_1 + node_of_inception_3b/pool_proj_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] inception_3b/1x1_2 copy [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_pool3/3x3_s2_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4a/1x1_1 + node_of_inception_4a/1x1_2 || node_of_inception_4a/3x3_reduce_1 + node_of_inception_4a/3x3_reduce_2 || node_of_inception_4a/5x5_reduce_1 + node_of_inception_4a/5x5_reduce_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4a/3x3_1 + node_of_inception_4a/3x3_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4a/5x5_1 + node_of_inception_4a/5x5_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4a/pool_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4a/pool_proj_1 + node_of_inception_4a/pool_proj_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] inception_4a/1x1_2 copy [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4b/1x1_1 + node_of_inception_4b/1x1_2 || node_of_inception_4b/3x3_reduce_1 + node_of_inception_4b/3x3_reduce_2 || node_of_inception_4b/5x5_reduce_1 + node_of_inception_4b/5x5_reduce_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4b/3x3_1 + node_of_inception_4b/3x3_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4b/5x5_1 + node_of_inception_4b/5x5_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4b/pool_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4b/pool_proj_1 + node_of_inception_4b/pool_proj_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] inception_4b/1x1_2 copy [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4c/1x1_1 + node_of_inception_4c/1x1_2 || node_of_inception_4c/3x3_reduce_1 + node_of_inception_4c/3x3_reduce_2 || node_of_inception_4c/5x5_reduce_1 + node_of_inception_4c/5x5_reduce_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4c/3x3_1 + node_of_inception_4c/3x3_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4c/5x5_1 + node_of_inception_4c/5x5_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4c/pool_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4c/pool_proj_1 + node_of_inception_4c/pool_proj_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] inception_4c/1x1_2 copy [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4d/5x5_reduce_1 + node_of_inception_4d/5x5_reduce_2 || node_of_inception_4d/1x1_1 + node_of_inception_4d/1x1_2 || node_of_inception_4d/3x3_reduce_1 + node_of_inception_4d/3x3_reduce_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4d/3x3_1 + node_of_inception_4d/3x3_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4d/5x5_1 + node_of_inception_4d/5x5_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4d/pool_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4d/pool_proj_1 + node_of_inception_4d/pool_proj_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] inception_4d/1x1_2 copy [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4e/1x1_1 + node_of_inception_4e/1x1_2 || node_of_inception_4e/3x3_reduce_1 + node_of_inception_4e/3x3_reduce_2 || node_of_inception_4e/5x5_reduce_1 + node_of_inception_4e/5x5_reduce_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4e/3x3_1 + node_of_inception_4e/3x3_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4e/5x5_1 + node_of_inception_4e/5x5_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4e/pool_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_4e/pool_proj_1 + node_of_inception_4e/pool_proj_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] inception_4e/1x1_2 copy [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_pool4/3x3_s2_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_5a/1x1_1 + node_of_inception_5a/1x1_2 || node_of_inception_5a/3x3_reduce_1 + node_of_inception_5a/3x3_reduce_2 || node_of_inception_5a/5x5_reduce_1 + node_of_inception_5a/5x5_reduce_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_5a/3x3_1 + node_of_inception_5a/3x3_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_5a/5x5_1 + node_of_inception_5a/5x5_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_5a/pool_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_5a/pool_proj_1 + node_of_inception_5a/pool_proj_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] inception_5a/1x1_2 copy [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_5b/1x1_1 + node_of_inception_5b/1x1_2 || node_of_inception_5b/3x3_reduce_1 + node_of_inception_5b/3x3_reduce_2 || node_of_inception_5b/5x5_reduce_1 + node_of_inception_5b/5x5_reduce_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_5b/3x3_1 + node_of_inception_5b/3x3_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_5b/5x5_1 + node_of_inception_5b/5x5_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_5b/pool_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_inception_5b/pool_proj_1 + node_of_inception_5b/pool_proj_2 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] inception_5b/1x1_2 copy [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_pool5/7x7_s1_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] node_of_loss3/classifier_1 [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] (Unnamed Layer* 143) [Shuffle] + (Unnamed Layer* 144) [Shuffle] [10/05/2022-19:46:28] [I] [TRT] [GpuLayer] (Unnamed Layer* 145) [Softmax] [10/05/2022-19:46:29] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +139, now: CPU 470, GPU 2314 (MiB) [10/05/2022-19:46:30] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +241, GPU +245, now: CPU 711, GPU 2559 (MiB) [10/05/2022-19:46:30] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [10/05/2022-19:46:42] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [10/05/2022-19:48:33] [I] [TRT] Detected 1 inputs and 1 output network tensors. [10/05/2022-19:48:34] [I] [TRT] Total Host Persistent Memory: 86496 [10/05/2022-19:48:34] [I] [TRT] Total Device Persistent Memory: 29362176 [10/05/2022-19:48:34] [I] [TRT] Total Scratch Memory: 0 [10/05/2022-19:48:34] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 10 MiB, GPU 96 MiB [10/05/2022-19:48:34] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 3.36168ms to assign 4 blocks to 39 nodes requiring 7225344 bytes. [10/05/2022-19:48:34] [I] [TRT] Total Activation Memory: 7225344 [10/05/2022-19:48:34] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 958, GPU 3161 (MiB) [10/05/2022-19:48:34] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 959, GPU 3161 (MiB) [10/05/2022-19:48:34] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +7, GPU +64, now: CPU 7, GPU 64 (MiB) [10/05/2022-19:48:34] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 991, GPU 3202 (MiB) [10/05/2022-19:48:34] [I] [TRT] Loaded engine size: 40 MiB [10/05/2022-19:48:34] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +0, now: CPU 992, GPU 3202 (MiB) [10/05/2022-19:48:34] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 992, GPU 3202 (MiB) [10/05/2022-19:48:34] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +40, now: CPU 0, GPU 40 (MiB) [10/05/2022-19:48:34] [I] Engine built in 128.515 sec. [10/05/2022-19:48:34] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 894, GPU 3162 (MiB) [10/05/2022-19:48:34] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 894, GPU 3162 (MiB) [10/05/2022-19:48:34] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +35, now: CPU 0, GPU 75 (MiB) [10/05/2022-19:48:34] [I] Using random values for input data_0 [10/05/2022-19:48:34] [I] Created input binding for data_0 with dimensions 1x3x224x224 [10/05/2022-19:48:34] [I] Using random values for output prob_1 [10/05/2022-19:48:34] [I] Created output binding for prob_1 with dimensions 1x1000 [10/05/2022-19:48:34] [I] Starting inference [10/05/2022-19:48:37] [I] Warmup completed 10 queries over 200 ms [10/05/2022-19:48:37] [I] Timing trace has 145 queries over 3.04025 s [10/05/2022-19:48:37] [I] [10/05/2022-19:48:37] [I] === Trace details === [10/05/2022-19:48:37] [I] Trace averages of 10 runs: [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.9204 ms - Host latency: 20.9807 ms (end to end 20.9902 ms, enqueue 2.75513 ms) [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.8748 ms - Host latency: 20.9357 ms (end to end 20.9452 ms, enqueue 2.67972 ms) [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.9248 ms - Host latency: 20.9874 ms (end to end 20.997 ms, enqueue 2.75833 ms) [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.8805 ms - Host latency: 20.94 ms (end to end 20.9496 ms, enqueue 2.5933 ms) [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.9027 ms - Host latency: 20.9621 ms (end to end 20.9716 ms, enqueue 2.58264 ms) [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.9223 ms - Host latency: 20.9829 ms (end to end 20.9928 ms, enqueue 2.67537 ms) [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.8781 ms - Host latency: 20.9381 ms (end to end 20.9477 ms, enqueue 2.84126 ms) [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.9084 ms - Host latency: 20.9676 ms (end to end 20.977 ms, enqueue 2.76918 ms) [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.8788 ms - Host latency: 20.9402 ms (end to end 20.95 ms, enqueue 2.80714 ms) [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.8864 ms - Host latency: 20.946 ms (end to end 20.9559 ms, enqueue 2.71272 ms) [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.9141 ms - Host latency: 20.9751 ms (end to end 20.9847 ms, enqueue 2.67795 ms) [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.8938 ms - Host latency: 20.9533 ms (end to end 20.9627 ms, enqueue 2.6917 ms) [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.9093 ms - Host latency: 20.97 ms (end to end 20.9797 ms, enqueue 2.67314 ms) [10/05/2022-19:48:37] [I] Average on 10 runs - GPU latency: 20.8604 ms - Host latency: 20.9197 ms (end to end 20.9294 ms, enqueue 2.78374 ms) [10/05/2022-19:48:37] [I] [10/05/2022-19:48:37] [I] === Performance summary === [10/05/2022-19:48:37] [I] Throughput: 47.6935 qps [10/05/2022-19:48:37] [I] Latency: min = 20.7125 ms, max = 21.2655 ms, mean = 20.957 ms, median = 20.9143 ms, percentile(99%) = 21.2529 ms [10/05/2022-19:48:37] [I] End-to-End Host Latency: min = 20.7222 ms, max = 21.2754 ms, mean = 20.9666 ms, median = 20.9244 ms, percentile(99%) = 21.2629 ms [10/05/2022-19:48:37] [I] Enqueue Time: min = 2.25146 ms, max = 3.62534 ms, mean = 2.71618 ms, median = 2.64417 ms, percentile(99%) = 3.61731 ms [10/05/2022-19:48:37] [I] H2D Latency: min = 0.0550537 ms, max = 0.0807495 ms, mean = 0.0572903 ms, median = 0.0563965 ms, percentile(99%) = 0.0716553 ms [10/05/2022-19:48:37] [I] GPU Compute Time: min = 20.6527 ms, max = 21.2067 ms, mean = 20.8967 ms, median = 20.8546 ms, percentile(99%) = 21.1943 ms [10/05/2022-19:48:37] [I] D2H Latency: min = 0.00170898 ms, max = 0.00488281 ms, mean = 0.0030103 ms, median = 0.00292969 ms, percentile(99%) = 0.00415039 ms [10/05/2022-19:48:37] [I] Total Host Walltime: 3.04025 s [10/05/2022-19:48:37] [I] Total GPU Compute Time: 3.03002 s [10/05/2022-19:48:37] [I] Explanations of the performance metrics are printed in the verbose logs. [10/05/2022-19:48:37] [I] &&&& PASSED TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --onnx=googlenet-12.onnx --saveEngine=googlenet-12_fp32.engine real 2m12.575s user 0m23.656s sys 0m27.776s