&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/acer/nfs-share/epoch_250.onnx --fp16 [09/24/2021-17:54:11] [I] === Model Options === [09/24/2021-17:54:11] [I] Format: ONNX [09/24/2021-17:54:11] [I] Model: /home/acer/nfs-share/epoch_250.onnx [09/24/2021-17:54:11] [I] Output: [09/24/2021-17:54:11] [I] === Build Options === [09/24/2021-17:54:11] [I] Max batch: explicit [09/24/2021-17:54:11] [I] Workspace: 16 MiB [09/24/2021-17:54:11] [I] minTiming: 1 [09/24/2021-17:54:11] [I] avgTiming: 8 [09/24/2021-17:54:11] [I] Precision: FP32+FP16 [09/24/2021-17:54:11] [I] Calibration: [09/24/2021-17:54:11] [I] Refit: Disabled [09/24/2021-17:54:11] [I] Sparsity: Disabled [09/24/2021-17:54:11] [I] Safe mode: Disabled [09/24/2021-17:54:11] [I] Restricted mode: Disabled [09/24/2021-17:54:11] [I] Save engine: [09/24/2021-17:54:11] [I] Load engine: [09/24/2021-17:54:11] [I] NVTX verbosity: 0 [09/24/2021-17:54:11] [I] Tactic sources: Using default tactic sources [09/24/2021-17:54:11] [I] timingCacheMode: local [09/24/2021-17:54:11] [I] timingCacheFile: [09/24/2021-17:54:11] [I] Input(s)s format: fp32:CHW [09/24/2021-17:54:11] [I] Output(s)s format: fp32:CHW [09/24/2021-17:54:11] [I] Input build shapes: model [09/24/2021-17:54:11] [I] Input calibration shapes: model [09/24/2021-17:54:11] [I] === System Options === [09/24/2021-17:54:11] [I] Device: 0 [09/24/2021-17:54:11] [I] DLACore: [09/24/2021-17:54:11] [I] Plugins: [09/24/2021-17:54:11] [I] === Inference Options === [09/24/2021-17:54:11] [I] Batch: Explicit [09/24/2021-17:54:11] [I] Input inference shapes: model [09/24/2021-17:54:11] [I] Iterations: 10 [09/24/2021-17:54:11] [I] Duration: 3s (+ 200ms warm up) [09/24/2021-17:54:11] [I] Sleep time: 0ms [09/24/2021-17:54:11] [I] Streams: 1 [09/24/2021-17:54:11] [I] ExposeDMA: Disabled [09/24/2021-17:54:11] [I] Data transfers: Enabled [09/24/2021-17:54:11] [I] Spin-wait: Disabled [09/24/2021-17:54:11] [I] Multithreading: Disabled [09/24/2021-17:54:11] [I] CUDA Graph: Disabled [09/24/2021-17:54:11] [I] Separate profiling: Disabled [09/24/2021-17:54:11] [I] Time Deserialize: Disabled [09/24/2021-17:54:11] [I] Time Refit: Disabled [09/24/2021-17:54:11] [I] Skip inference: Disabled [09/24/2021-17:54:11] [I] Inputs: [09/24/2021-17:54:11] [I] === Reporting Options === [09/24/2021-17:54:11] [I] Verbose: Disabled [09/24/2021-17:54:11] [I] Averages: 10 inferences [09/24/2021-17:54:11] [I] Percentile: 99 [09/24/2021-17:54:11] [I] Dump refittable layers:Disabled [09/24/2021-17:54:11] [I] Dump output: Disabled [09/24/2021-17:54:11] [I] Profile: Disabled [09/24/2021-17:54:11] [I] Export timing to JSON file: [09/24/2021-17:54:11] [I] Export output to JSON file: [09/24/2021-17:54:11] [I] Export profile to JSON file: [09/24/2021-17:54:11] [I] [09/24/2021-17:54:11] [I] === Device Information === [09/24/2021-17:54:11] [I] Selected Device: Xavier [09/24/2021-17:54:11] [I] Compute Capability: 7.2 [09/24/2021-17:54:11] [I] SMs: 6 [09/24/2021-17:54:11] [I] Compute Clock Rate: 1.109 GHz [09/24/2021-17:54:11] [I] Device Global Memory: 7773 MiB [09/24/2021-17:54:11] [I] Shared Memory per SM: 96 KiB [09/24/2021-17:54:11] [I] Memory Bus Width: 256 bits (ECC disabled) [09/24/2021-17:54:11] [I] Memory Clock Rate: 1.109 GHz [09/24/2021-17:54:11] [I] [09/24/2021-17:54:11] [I] TensorRT version: 8001 [09/24/2021-17:54:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 3724 (MiB) [09/24/2021-17:54:12] [I] Start parsing network model [09/24/2021-17:54:12] [I] [TRT] ---------------------------------------------------------------- [09/24/2021-17:54:12] [I] [TRT] Input filename: /home/acer/nfs-share/epoch_250.onnx [09/24/2021-17:54:12] [I] [TRT] ONNX IR version: 0.0.6 [09/24/2021-17:54:12] [I] [TRT] Opset version: 13 [09/24/2021-17:54:12] [I] [TRT] Producer name: pytorch [09/24/2021-17:54:12] [I] [TRT] Producer version: 1.8 [09/24/2021-17:54:12] [I] [TRT] Domain: [09/24/2021-17:54:12] [I] [TRT] Model version: 0 [09/24/2021-17:54:12] [I] [TRT] Doc string: [09/24/2021-17:54:12] [I] [TRT] ---------------------------------------------------------------- [09/24/2021-17:54:12] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [09/24/2021-17:54:12] [I] Finish parsing network model [09/24/2021-17:54:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 374, GPU 3730 (MiB) [09/24/2021-17:54:12] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 374 MiB, GPU 3730 MiB [09/24/2021-17:54:12] [I] [TRT] ---------- Layers Running on DLA ---------- [09/24/2021-17:54:12] [I] [TRT] ---------- Layers Running on GPU ---------- [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_0 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_1 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_2 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_3 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_4 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_5 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_6 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_7 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_8 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_9 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_10 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_11 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_12 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_13 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_14 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_15 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_16 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_17 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_18 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_19 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_20 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_21 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_22 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_54 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_23 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_24 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_25 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_26 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_27 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_28 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_29 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_30 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_31 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_32 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_33 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_34 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_35 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_36 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_37 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_38 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_39 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_40 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_41 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_42 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_43 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_44 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_45 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_46 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_56 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_47 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_48 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_49 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_50 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_51 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_52 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_53 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_58 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_59 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_122 || Conv_123 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_124 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_125 || Conv_126 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Resize_78 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_127 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] PWN(LeakyRelu_57, Add_79) [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_128 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_80 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 748 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 754 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_81 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] PWN(Relu_130) [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_113 || Conv_114 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_149 || Conv_177 || Conv_205 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_115 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_116 || Conv_117 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Resize_100 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_118 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] PWN(LeakyRelu_55, Add_101) [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_119 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_102 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 733 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 739 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_103 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] PWN(Relu_121) [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_150 + Reshape_157 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_178 + Reshape_185 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_206 + Reshape_213 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_104 || Conv_105 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_140 || Conv_168 || Conv_196 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_106 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_107 || Conv_108 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_109 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_110 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 718 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 724 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] PWN(Relu_112) [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_141 + Reshape_148 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_169 + Reshape_176 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_197 + Reshape_204 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_131 || Conv_159 || Conv_187 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_132 + Reshape_139 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_160 + Reshape_167 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_188 + Reshape_195 [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 497 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 512 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 527 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 543 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 558 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 573 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 589 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 604 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 619 copy [09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Softmax_215 [09/24/2021-17:54:13] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +225, now: CPU 601, GPU 3955 (MiB) [09/24/2021-17:54:14] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +309, now: CPU 908, GPU 4264 (MiB) [09/24/2021-17:54:14] [W] [TRT] Detected invalid timing cache, setup a local cache instead [09/24/2021-17:56:21] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [09/24/2021-18:03:18] [I] [TRT] Detected 1 inputs and 9 output network tensors. [09/24/2021-18:03:18] [I] [TRT] Total Host Persistent Memory: 112368 [09/24/2021-18:03:18] [I] [TRT] Total Device Persistent Memory: 872448 [09/24/2021-18:03:18] [I] [TRT] Total Scratch Memory: 0 [09/24/2021-18:03:18] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 2 MiB, GPU 27 MiB [09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1379, GPU 4995 (MiB) [09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +3, now: CPU 1380, GPU 4998 (MiB) [09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1379, GPU 4998 (MiB) [09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1379, GPU 4998 (MiB) [09/24/2021-18:03:18] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 1378 MiB, GPU 4998 MiB [09/24/2021-18:03:18] [I] [TRT] Loaded engine size: 3 MB [09/24/2021-18:03:18] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 1373 MiB, GPU 4998 MiB [09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1377, GPU 4998 (MiB) [09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1377, GPU 4998 (MiB) [09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1377, GPU 4998 (MiB) [09/24/2021-18:03:18] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1377 MiB, GPU 4998 MiB [09/24/2021-18:03:18] [I] Engine built in 546.945 sec. [09/24/2021-18:03:18] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1373 MiB, GPU 4998 MiB [09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +0, now: CPU 1374, GPU 4998 (MiB) [09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1374, GPU 4998 (MiB) [09/24/2021-18:03:18] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1374 MiB, GPU 4998 MiB [09/24/2021-18:03:18] [I] Created input binding for input.1 with dimensions 1x3x640x352 [09/24/2021-18:03:18] [I] Created output binding for 528 with dimensions 1x9240x4 [09/24/2021-18:03:18] [I] Created output binding for 620 with dimensions 1x9240x10 [09/24/2021-18:03:18] [I] Created output binding for 621 with dimensions 1x9240x2 [09/24/2021-18:03:18] [I] Starting inference [09/24/2021-18:03:21] [I] Warmup completed 69 queries over 200 ms [09/24/2021-18:03:21] [I] Timing trace has 1043 queries over 3.00697 s [09/24/2021-18:03:21] [I] [09/24/2021-18:03:21] [I] === Trace details === [09/24/2021-18:03:21] [I] Trace averages of 10 runs: [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.70639 ms - Host latency: 2.85475 ms (end to end 2.86387 ms, enqueue 2.47079 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.70923 ms - Host latency: 2.85816 ms (end to end 2.86698 ms, enqueue 2.47612 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.70533 ms - Host latency: 2.85456 ms (end to end 2.86412 ms, enqueue 2.42725 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71052 ms - Host latency: 2.85972 ms (end to end 2.86985 ms, enqueue 2.41327 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.70642 ms - Host latency: 2.8557 ms (end to end 2.86628 ms, enqueue 2.40972 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.70663 ms - Host latency: 2.85532 ms (end to end 2.86566 ms, enqueue 2.43371 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71177 ms - Host latency: 2.86067 ms (end to end 2.87164 ms, enqueue 2.44474 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.70885 ms - Host latency: 2.85789 ms (end to end 2.86838 ms, enqueue 2.41823 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.7094 ms - Host latency: 2.85828 ms (end to end 2.86775 ms, enqueue 2.39249 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71215 ms - Host latency: 2.86103 ms (end to end 2.87014 ms, enqueue 2.3898 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71315 ms - Host latency: 2.86225 ms (end to end 2.87265 ms, enqueue 2.35911 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71509 ms - Host latency: 2.8641 ms (end to end 2.87313 ms, enqueue 2.38124 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71351 ms - Host latency: 2.86283 ms (end to end 2.87363 ms, enqueue 2.40785 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71429 ms - Host latency: 2.86291 ms (end to end 2.87156 ms, enqueue 2.35654 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71497 ms - Host latency: 2.86378 ms (end to end 2.87404 ms, enqueue 2.37977 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71479 ms - Host latency: 2.86388 ms (end to end 2.87423 ms, enqueue 2.36373 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71691 ms - Host latency: 2.86591 ms (end to end 2.87681 ms, enqueue 2.34723 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71134 ms - Host latency: 2.86025 ms (end to end 2.86932 ms, enqueue 2.48779 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71472 ms - Host latency: 2.86376 ms (end to end 2.87513 ms, enqueue 2.37139 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71122 ms - Host latency: 2.86039 ms (end to end 2.87046 ms, enqueue 2.38494 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71832 ms - Host latency: 2.8676 ms (end to end 2.87749 ms, enqueue 2.34695 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71581 ms - Host latency: 2.8647 ms (end to end 2.87574 ms, enqueue 2.38444 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71778 ms - Host latency: 2.86743 ms (end to end 2.87744 ms, enqueue 2.31958 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71899 ms - Host latency: 2.86826 ms (end to end 2.87905 ms, enqueue 2.33584 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71989 ms - Host latency: 2.86898 ms (end to end 2.88007 ms, enqueue 2.32 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71978 ms - Host latency: 2.86901 ms (end to end 2.88066 ms, enqueue 2.34491 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71843 ms - Host latency: 2.86776 ms (end to end 2.87969 ms, enqueue 2.33533 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71785 ms - Host latency: 2.86656 ms (end to end 2.87456 ms, enqueue 2.30854 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71844 ms - Host latency: 2.86682 ms (end to end 2.87673 ms, enqueue 2.33306 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71946 ms - Host latency: 2.86829 ms (end to end 2.87955 ms, enqueue 2.31394 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72255 ms - Host latency: 2.87161 ms (end to end 2.88237 ms, enqueue 2.28561 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71885 ms - Host latency: 2.86763 ms (end to end 2.87814 ms, enqueue 2.31257 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71903 ms - Host latency: 2.86901 ms (end to end 2.8811 ms, enqueue 2.361 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72218 ms - Host latency: 2.87177 ms (end to end 2.88036 ms, enqueue 2.28973 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71862 ms - Host latency: 2.86781 ms (end to end 2.8786 ms, enqueue 2.32296 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72014 ms - Host latency: 2.86918 ms (end to end 2.8818 ms, enqueue 2.26718 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72277 ms - Host latency: 2.87177 ms (end to end 2.88297 ms, enqueue 2.2554 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72057 ms - Host latency: 2.86956 ms (end to end 2.88055 ms, enqueue 2.32466 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72396 ms - Host latency: 2.87297 ms (end to end 2.88419 ms, enqueue 2.28302 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71725 ms - Host latency: 2.86739 ms (end to end 2.8785 ms, enqueue 2.30909 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71965 ms - Host latency: 2.86887 ms (end to end 2.88091 ms, enqueue 2.30076 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72202 ms - Host latency: 2.87123 ms (end to end 2.88203 ms, enqueue 2.26357 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72335 ms - Host latency: 2.873 ms (end to end 2.88324 ms, enqueue 2.26787 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72119 ms - Host latency: 2.87006 ms (end to end 2.88112 ms, enqueue 2.23042 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72504 ms - Host latency: 2.87454 ms (end to end 2.88406 ms, enqueue 2.25394 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72554 ms - Host latency: 2.87407 ms (end to end 2.88607 ms, enqueue 2.21636 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72086 ms - Host latency: 2.86981 ms (end to end 2.88287 ms, enqueue 2.2452 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.7245 ms - Host latency: 2.87317 ms (end to end 2.88436 ms, enqueue 2.23196 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72872 ms - Host latency: 2.87772 ms (end to end 2.88883 ms, enqueue 2.21051 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72706 ms - Host latency: 2.87556 ms (end to end 2.88739 ms, enqueue 2.21034 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72795 ms - Host latency: 2.87665 ms (end to end 2.88616 ms, enqueue 2.21237 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71896 ms - Host latency: 2.8681 ms (end to end 2.87992 ms, enqueue 2.22323 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71742 ms - Host latency: 2.8666 ms (end to end 2.87747 ms, enqueue 2.34011 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.73164 ms - Host latency: 2.88026 ms (end to end 2.89055 ms, enqueue 2.23746 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72583 ms - Host latency: 2.87469 ms (end to end 2.88632 ms, enqueue 2.25873 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72374 ms - Host latency: 2.87352 ms (end to end 2.88542 ms, enqueue 2.24001 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72163 ms - Host latency: 2.8709 ms (end to end 2.88412 ms, enqueue 2.25905 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72494 ms - Host latency: 2.87427 ms (end to end 2.88723 ms, enqueue 2.24247 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72588 ms - Host latency: 2.87472 ms (end to end 2.88551 ms, enqueue 2.22769 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72556 ms - Host latency: 2.87434 ms (end to end 2.8855 ms, enqueue 2.20978 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72438 ms - Host latency: 2.87362 ms (end to end 2.88447 ms, enqueue 2.19883 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72251 ms - Host latency: 2.87129 ms (end to end 2.88204 ms, enqueue 2.21389 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72595 ms - Host latency: 2.87466 ms (end to end 2.8869 ms, enqueue 2.20353 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72393 ms - Host latency: 2.87378 ms (end to end 2.88608 ms, enqueue 2.20118 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72903 ms - Host latency: 2.87794 ms (end to end 2.88765 ms, enqueue 2.22356 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72593 ms - Host latency: 2.87532 ms (end to end 2.88752 ms, enqueue 2.2075 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72351 ms - Host latency: 2.87275 ms (end to end 2.88433 ms, enqueue 2.2041 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72571 ms - Host latency: 2.87483 ms (end to end 2.88606 ms, enqueue 2.24937 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72527 ms - Host latency: 2.87451 ms (end to end 2.88503 ms, enqueue 2.15012 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72883 ms - Host latency: 2.87827 ms (end to end 2.89048 ms, enqueue 2.15698 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72827 ms - Host latency: 2.87739 ms (end to end 2.88865 ms, enqueue 2.16128 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72861 ms - Host latency: 2.87764 ms (end to end 2.88801 ms, enqueue 2.16394 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.73086 ms - Host latency: 2.87974 ms (end to end 2.89001 ms, enqueue 2.13909 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72595 ms - Host latency: 2.87593 ms (end to end 2.88835 ms, enqueue 2.14246 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72605 ms - Host latency: 2.87549 ms (end to end 2.88892 ms, enqueue 2.14102 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72881 ms - Host latency: 2.87747 ms (end to end 2.88958 ms, enqueue 2.16865 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72617 ms - Host latency: 2.87554 ms (end to end 2.8874 ms, enqueue 2.14993 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72297 ms - Host latency: 2.87178 ms (end to end 2.88345 ms, enqueue 2.16941 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72729 ms - Host latency: 2.87646 ms (end to end 2.88779 ms, enqueue 2.16853 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72605 ms - Host latency: 2.87493 ms (end to end 2.88796 ms, enqueue 2.1469 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72817 ms - Host latency: 2.8772 ms (end to end 2.88977 ms, enqueue 2.15818 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72676 ms - Host latency: 2.87561 ms (end to end 2.88535 ms, enqueue 2.15586 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.729 ms - Host latency: 2.878 ms (end to end 2.8928 ms, enqueue 2.14558 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.7334 ms - Host latency: 2.8825 ms (end to end 2.89434 ms, enqueue 2.15833 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.73293 ms - Host latency: 2.88237 ms (end to end 2.89509 ms, enqueue 2.16743 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72717 ms - Host latency: 2.87607 ms (end to end 2.88772 ms, enqueue 2.13076 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72773 ms - Host latency: 2.87712 ms (end to end 2.88965 ms, enqueue 2.15789 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.7219 ms - Host latency: 2.87117 ms (end to end 2.88022 ms, enqueue 2.154 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72778 ms - Host latency: 2.87649 ms (end to end 2.88843 ms, enqueue 2.14954 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72512 ms - Host latency: 2.87446 ms (end to end 2.88699 ms, enqueue 2.13989 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72683 ms - Host latency: 2.87585 ms (end to end 2.88831 ms, enqueue 2.14424 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72576 ms - Host latency: 2.87451 ms (end to end 2.88564 ms, enqueue 2.12805 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72759 ms - Host latency: 2.87595 ms (end to end 2.88586 ms, enqueue 2.12693 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.73059 ms - Host latency: 2.87937 ms (end to end 2.88926 ms, enqueue 2.13843 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72612 ms - Host latency: 2.87515 ms (end to end 2.88621 ms, enqueue 2.13638 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72756 ms - Host latency: 2.87742 ms (end to end 2.88691 ms, enqueue 2.14333 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72332 ms - Host latency: 2.8719 ms (end to end 2.88059 ms, enqueue 2.19648 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72971 ms - Host latency: 2.8783 ms (end to end 2.88728 ms, enqueue 2.14587 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72881 ms - Host latency: 2.878 ms (end to end 2.8875 ms, enqueue 2.1511 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72346 ms - Host latency: 2.87261 ms (end to end 2.88379 ms, enqueue 2.13662 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72776 ms - Host latency: 2.87666 ms (end to end 2.88899 ms, enqueue 2.20024 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72832 ms - Host latency: 2.87734 ms (end to end 2.89214 ms, enqueue 2.16211 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72888 ms - Host latency: 2.87791 ms (end to end 2.8876 ms, enqueue 2.14819 ms) [09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.73079 ms - Host latency: 2.87974 ms (end to end 2.88977 ms, enqueue 2.13779 ms) [09/24/2021-18:03:21] [I] [09/24/2021-18:03:21] [I] === Performance summary === [09/24/2021-18:03:21] [I] Throughput: 346.861 qps [09/24/2021-18:03:21] [I] Latency: min = 2.82928 ms, max = 2.96851 ms, mean = 2.8712 ms, median = 2.87195 ms, percentile(99%) = 2.89172 ms [09/24/2021-18:03:21] [I] End-to-End Host Latency: min = 2.8381 ms, max = 2.97803 ms, mean = 2.88218 ms, median = 2.88318 ms, percentile(99%) = 2.90479 ms [09/24/2021-18:03:21] [I] Enqueue Time: min = 2.05884 ms, max = 2.83978 ms, mean = 2.25478 ms, median = 2.23706 ms, percentile(99%) = 2.55811 ms [09/24/2021-18:03:21] [I] H2D Latency: min = 0.114624 ms, max = 0.126953 ms, mean = 0.115959 ms, median = 0.115845 ms, percentile(99%) = 0.120117 ms [09/24/2021-18:03:21] [I] GPU Compute Time: min = 2.68076 ms, max = 2.81812 ms, mean = 2.72213 ms, median = 2.7229 ms, percentile(99%) = 2.74243 ms [09/24/2021-18:03:21] [I] D2H Latency: min = 0.03125 ms, max = 0.0352783 ms, mean = 0.0331086 ms, median = 0.0331421 ms, percentile(99%) = 0.0344238 ms [09/24/2021-18:03:21] [I] Total Host Walltime: 3.00697 s [09/24/2021-18:03:21] [I] Total GPU Compute Time: 2.83918 s [09/24/2021-18:03:21] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized. [09/24/2021-18:03:21] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput. [09/24/2021-18:03:21] [I] Explanations of the performance metrics are printed in the verbose logs. [09/24/2021-18:03:21] [I] &&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/acer/nfs-share/epoch_250.onnx --fp16 [09/24/2021-18:03:21] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1374, GPU 4998 (MiB)