&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/acer/nfs-share/epoch_250.onnx --fp16
[09/24/2021-17:54:11] [I] === Model Options ===
[09/24/2021-17:54:11] [I] Format: ONNX
[09/24/2021-17:54:11] [I] Model: /home/acer/nfs-share/epoch_250.onnx
[09/24/2021-17:54:11] [I] Output:
[09/24/2021-17:54:11] [I] === Build Options ===
[09/24/2021-17:54:11] [I] Max batch: explicit
[09/24/2021-17:54:11] [I] Workspace: 16 MiB
[09/24/2021-17:54:11] [I] minTiming: 1
[09/24/2021-17:54:11] [I] avgTiming: 8
[09/24/2021-17:54:11] [I] Precision: FP32+FP16
[09/24/2021-17:54:11] [I] Calibration: 
[09/24/2021-17:54:11] [I] Refit: Disabled
[09/24/2021-17:54:11] [I] Sparsity: Disabled
[09/24/2021-17:54:11] [I] Safe mode: Disabled
[09/24/2021-17:54:11] [I] Restricted mode: Disabled
[09/24/2021-17:54:11] [I] Save engine: 
[09/24/2021-17:54:11] [I] Load engine: 
[09/24/2021-17:54:11] [I] NVTX verbosity: 0
[09/24/2021-17:54:11] [I] Tactic sources: Using default tactic sources
[09/24/2021-17:54:11] [I] timingCacheMode: local
[09/24/2021-17:54:11] [I] timingCacheFile: 
[09/24/2021-17:54:11] [I] Input(s)s format: fp32:CHW
[09/24/2021-17:54:11] [I] Output(s)s format: fp32:CHW
[09/24/2021-17:54:11] [I] Input build shapes: model
[09/24/2021-17:54:11] [I] Input calibration shapes: model
[09/24/2021-17:54:11] [I] === System Options ===
[09/24/2021-17:54:11] [I] Device: 0
[09/24/2021-17:54:11] [I] DLACore: 
[09/24/2021-17:54:11] [I] Plugins:
[09/24/2021-17:54:11] [I] === Inference Options ===
[09/24/2021-17:54:11] [I] Batch: Explicit
[09/24/2021-17:54:11] [I] Input inference shapes: model
[09/24/2021-17:54:11] [I] Iterations: 10
[09/24/2021-17:54:11] [I] Duration: 3s (+ 200ms warm up)
[09/24/2021-17:54:11] [I] Sleep time: 0ms
[09/24/2021-17:54:11] [I] Streams: 1
[09/24/2021-17:54:11] [I] ExposeDMA: Disabled
[09/24/2021-17:54:11] [I] Data transfers: Enabled
[09/24/2021-17:54:11] [I] Spin-wait: Disabled
[09/24/2021-17:54:11] [I] Multithreading: Disabled
[09/24/2021-17:54:11] [I] CUDA Graph: Disabled
[09/24/2021-17:54:11] [I] Separate profiling: Disabled
[09/24/2021-17:54:11] [I] Time Deserialize: Disabled
[09/24/2021-17:54:11] [I] Time Refit: Disabled
[09/24/2021-17:54:11] [I] Skip inference: Disabled
[09/24/2021-17:54:11] [I] Inputs:
[09/24/2021-17:54:11] [I] === Reporting Options ===
[09/24/2021-17:54:11] [I] Verbose: Disabled
[09/24/2021-17:54:11] [I] Averages: 10 inferences
[09/24/2021-17:54:11] [I] Percentile: 99
[09/24/2021-17:54:11] [I] Dump refittable layers:Disabled
[09/24/2021-17:54:11] [I] Dump output: Disabled
[09/24/2021-17:54:11] [I] Profile: Disabled
[09/24/2021-17:54:11] [I] Export timing to JSON file: 
[09/24/2021-17:54:11] [I] Export output to JSON file: 
[09/24/2021-17:54:11] [I] Export profile to JSON file: 
[09/24/2021-17:54:11] [I] 
[09/24/2021-17:54:11] [I] === Device Information ===
[09/24/2021-17:54:11] [I] Selected Device: Xavier
[09/24/2021-17:54:11] [I] Compute Capability: 7.2
[09/24/2021-17:54:11] [I] SMs: 6
[09/24/2021-17:54:11] [I] Compute Clock Rate: 1.109 GHz
[09/24/2021-17:54:11] [I] Device Global Memory: 7773 MiB
[09/24/2021-17:54:11] [I] Shared Memory per SM: 96 KiB
[09/24/2021-17:54:11] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/24/2021-17:54:11] [I] Memory Clock Rate: 1.109 GHz
[09/24/2021-17:54:11] [I] 
[09/24/2021-17:54:11] [I] TensorRT version: 8001
[09/24/2021-17:54:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 3724 (MiB)
[09/24/2021-17:54:12] [I] Start parsing network model
[09/24/2021-17:54:12] [I] [TRT] ----------------------------------------------------------------
[09/24/2021-17:54:12] [I] [TRT] Input filename:   /home/acer/nfs-share/epoch_250.onnx
[09/24/2021-17:54:12] [I] [TRT] ONNX IR version:  0.0.6
[09/24/2021-17:54:12] [I] [TRT] Opset version:    13
[09/24/2021-17:54:12] [I] [TRT] Producer name:    pytorch
[09/24/2021-17:54:12] [I] [TRT] Producer version: 1.8
[09/24/2021-17:54:12] [I] [TRT] Domain:           
[09/24/2021-17:54:12] [I] [TRT] Model version:    0
[09/24/2021-17:54:12] [I] [TRT] Doc string:       
[09/24/2021-17:54:12] [I] [TRT] ----------------------------------------------------------------
[09/24/2021-17:54:12] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/24/2021-17:54:12] [I] Finish parsing network model
[09/24/2021-17:54:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 374, GPU 3730 (MiB)
[09/24/2021-17:54:12] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 374 MiB, GPU 3730 MiB
[09/24/2021-17:54:12] [I] [TRT] ---------- Layers Running on DLA ----------
[09/24/2021-17:54:12] [I] [TRT] ---------- Layers Running on GPU ----------
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_0
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_1
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_2
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_3
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_4
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_5
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_6
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_7
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_8
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_9
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_10
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_11
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_12
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_13
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_14
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_15
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_16
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_17
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_18
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_19
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_20
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_21
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_22
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_54
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_23
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_24
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_25
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_26
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_27
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_28
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_29
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_30
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_31
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_32
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_33
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_34
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_35
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_36
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_37
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_38
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_39
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_40
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_41
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_42
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_43
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_44
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_45
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_46
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_56
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_47
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_48
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_49
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_50
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_51
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_52
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_53
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_58
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_59
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_122 || Conv_123
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_124
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_125 || Conv_126
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Resize_78
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_127
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] PWN(LeakyRelu_57, Add_79)
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_128
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_80
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 748 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 754 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_81
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] PWN(Relu_130)
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_113 || Conv_114
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_149 || Conv_177 || Conv_205
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_115
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_116 || Conv_117
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Resize_100
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_118
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] PWN(LeakyRelu_55, Add_101)
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_119
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_102
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 733 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 739 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_103
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] PWN(Relu_121)
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_150 + Reshape_157
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_178 + Reshape_185
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_206 + Reshape_213
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_104 || Conv_105
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_140 || Conv_168 || Conv_196
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_106
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_107 || Conv_108
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] LeakyRelu_109
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_110
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 718 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 724 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] PWN(Relu_112)
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_141 + Reshape_148
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_169 + Reshape_176
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_197 + Reshape_204
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Conv_131 || Conv_159 || Conv_187
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_132 + Reshape_139
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_160 + Reshape_167
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Transpose_188 + Reshape_195
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 497 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 512 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 527 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 543 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 558 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 573 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 589 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 604 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] 619 copy
[09/24/2021-17:54:12] [I] [TRT] [GpuLayer] Softmax_215
[09/24/2021-17:54:13] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +225, now: CPU 601, GPU 3955 (MiB)
[09/24/2021-17:54:14] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +309, now: CPU 908, GPU 4264 (MiB)
[09/24/2021-17:54:14] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[09/24/2021-17:56:21] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[09/24/2021-18:03:18] [I] [TRT] Detected 1 inputs and 9 output network tensors.
[09/24/2021-18:03:18] [I] [TRT] Total Host Persistent Memory: 112368
[09/24/2021-18:03:18] [I] [TRT] Total Device Persistent Memory: 872448
[09/24/2021-18:03:18] [I] [TRT] Total Scratch Memory: 0
[09/24/2021-18:03:18] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 2 MiB, GPU 27 MiB
[09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1379, GPU 4995 (MiB)
[09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +3, now: CPU 1380, GPU 4998 (MiB)
[09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1379, GPU 4998 (MiB)
[09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1379, GPU 4998 (MiB)
[09/24/2021-18:03:18] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 1378 MiB, GPU 4998 MiB
[09/24/2021-18:03:18] [I] [TRT] Loaded engine size: 3 MB
[09/24/2021-18:03:18] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 1373 MiB, GPU 4998 MiB
[09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1377, GPU 4998 (MiB)
[09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1377, GPU 4998 (MiB)
[09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1377, GPU 4998 (MiB)
[09/24/2021-18:03:18] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1377 MiB, GPU 4998 MiB
[09/24/2021-18:03:18] [I] Engine built in 546.945 sec.
[09/24/2021-18:03:18] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1373 MiB, GPU 4998 MiB
[09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +0, now: CPU 1374, GPU 4998 (MiB)
[09/24/2021-18:03:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1374, GPU 4998 (MiB)
[09/24/2021-18:03:18] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1374 MiB, GPU 4998 MiB
[09/24/2021-18:03:18] [I] Created input binding for input.1 with dimensions 1x3x640x352
[09/24/2021-18:03:18] [I] Created output binding for 528 with dimensions 1x9240x4
[09/24/2021-18:03:18] [I] Created output binding for 620 with dimensions 1x9240x10
[09/24/2021-18:03:18] [I] Created output binding for 621 with dimensions 1x9240x2
[09/24/2021-18:03:18] [I] Starting inference
[09/24/2021-18:03:21] [I] Warmup completed 69 queries over 200 ms
[09/24/2021-18:03:21] [I] Timing trace has 1043 queries over 3.00697 s
[09/24/2021-18:03:21] [I] 
[09/24/2021-18:03:21] [I] === Trace details ===
[09/24/2021-18:03:21] [I] Trace averages of 10 runs:
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.70639 ms - Host latency: 2.85475 ms (end to end 2.86387 ms, enqueue 2.47079 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.70923 ms - Host latency: 2.85816 ms (end to end 2.86698 ms, enqueue 2.47612 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.70533 ms - Host latency: 2.85456 ms (end to end 2.86412 ms, enqueue 2.42725 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71052 ms - Host latency: 2.85972 ms (end to end 2.86985 ms, enqueue 2.41327 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.70642 ms - Host latency: 2.8557 ms (end to end 2.86628 ms, enqueue 2.40972 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.70663 ms - Host latency: 2.85532 ms (end to end 2.86566 ms, enqueue 2.43371 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71177 ms - Host latency: 2.86067 ms (end to end 2.87164 ms, enqueue 2.44474 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.70885 ms - Host latency: 2.85789 ms (end to end 2.86838 ms, enqueue 2.41823 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.7094 ms - Host latency: 2.85828 ms (end to end 2.86775 ms, enqueue 2.39249 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71215 ms - Host latency: 2.86103 ms (end to end 2.87014 ms, enqueue 2.3898 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71315 ms - Host latency: 2.86225 ms (end to end 2.87265 ms, enqueue 2.35911 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71509 ms - Host latency: 2.8641 ms (end to end 2.87313 ms, enqueue 2.38124 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71351 ms - Host latency: 2.86283 ms (end to end 2.87363 ms, enqueue 2.40785 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71429 ms - Host latency: 2.86291 ms (end to end 2.87156 ms, enqueue 2.35654 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71497 ms - Host latency: 2.86378 ms (end to end 2.87404 ms, enqueue 2.37977 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71479 ms - Host latency: 2.86388 ms (end to end 2.87423 ms, enqueue 2.36373 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71691 ms - Host latency: 2.86591 ms (end to end 2.87681 ms, enqueue 2.34723 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71134 ms - Host latency: 2.86025 ms (end to end 2.86932 ms, enqueue 2.48779 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71472 ms - Host latency: 2.86376 ms (end to end 2.87513 ms, enqueue 2.37139 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71122 ms - Host latency: 2.86039 ms (end to end 2.87046 ms, enqueue 2.38494 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71832 ms - Host latency: 2.8676 ms (end to end 2.87749 ms, enqueue 2.34695 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71581 ms - Host latency: 2.8647 ms (end to end 2.87574 ms, enqueue 2.38444 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71778 ms - Host latency: 2.86743 ms (end to end 2.87744 ms, enqueue 2.31958 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71899 ms - Host latency: 2.86826 ms (end to end 2.87905 ms, enqueue 2.33584 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71989 ms - Host latency: 2.86898 ms (end to end 2.88007 ms, enqueue 2.32 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71978 ms - Host latency: 2.86901 ms (end to end 2.88066 ms, enqueue 2.34491 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71843 ms - Host latency: 2.86776 ms (end to end 2.87969 ms, enqueue 2.33533 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71785 ms - Host latency: 2.86656 ms (end to end 2.87456 ms, enqueue 2.30854 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71844 ms - Host latency: 2.86682 ms (end to end 2.87673 ms, enqueue 2.33306 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71946 ms - Host latency: 2.86829 ms (end to end 2.87955 ms, enqueue 2.31394 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72255 ms - Host latency: 2.87161 ms (end to end 2.88237 ms, enqueue 2.28561 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71885 ms - Host latency: 2.86763 ms (end to end 2.87814 ms, enqueue 2.31257 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71903 ms - Host latency: 2.86901 ms (end to end 2.8811 ms, enqueue 2.361 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72218 ms - Host latency: 2.87177 ms (end to end 2.88036 ms, enqueue 2.28973 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71862 ms - Host latency: 2.86781 ms (end to end 2.8786 ms, enqueue 2.32296 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72014 ms - Host latency: 2.86918 ms (end to end 2.8818 ms, enqueue 2.26718 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72277 ms - Host latency: 2.87177 ms (end to end 2.88297 ms, enqueue 2.2554 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72057 ms - Host latency: 2.86956 ms (end to end 2.88055 ms, enqueue 2.32466 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72396 ms - Host latency: 2.87297 ms (end to end 2.88419 ms, enqueue 2.28302 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71725 ms - Host latency: 2.86739 ms (end to end 2.8785 ms, enqueue 2.30909 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71965 ms - Host latency: 2.86887 ms (end to end 2.88091 ms, enqueue 2.30076 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72202 ms - Host latency: 2.87123 ms (end to end 2.88203 ms, enqueue 2.26357 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72335 ms - Host latency: 2.873 ms (end to end 2.88324 ms, enqueue 2.26787 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72119 ms - Host latency: 2.87006 ms (end to end 2.88112 ms, enqueue 2.23042 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72504 ms - Host latency: 2.87454 ms (end to end 2.88406 ms, enqueue 2.25394 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72554 ms - Host latency: 2.87407 ms (end to end 2.88607 ms, enqueue 2.21636 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72086 ms - Host latency: 2.86981 ms (end to end 2.88287 ms, enqueue 2.2452 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.7245 ms - Host latency: 2.87317 ms (end to end 2.88436 ms, enqueue 2.23196 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72872 ms - Host latency: 2.87772 ms (end to end 2.88883 ms, enqueue 2.21051 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72706 ms - Host latency: 2.87556 ms (end to end 2.88739 ms, enqueue 2.21034 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72795 ms - Host latency: 2.87665 ms (end to end 2.88616 ms, enqueue 2.21237 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71896 ms - Host latency: 2.8681 ms (end to end 2.87992 ms, enqueue 2.22323 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.71742 ms - Host latency: 2.8666 ms (end to end 2.87747 ms, enqueue 2.34011 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.73164 ms - Host latency: 2.88026 ms (end to end 2.89055 ms, enqueue 2.23746 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72583 ms - Host latency: 2.87469 ms (end to end 2.88632 ms, enqueue 2.25873 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72374 ms - Host latency: 2.87352 ms (end to end 2.88542 ms, enqueue 2.24001 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72163 ms - Host latency: 2.8709 ms (end to end 2.88412 ms, enqueue 2.25905 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72494 ms - Host latency: 2.87427 ms (end to end 2.88723 ms, enqueue 2.24247 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72588 ms - Host latency: 2.87472 ms (end to end 2.88551 ms, enqueue 2.22769 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72556 ms - Host latency: 2.87434 ms (end to end 2.8855 ms, enqueue 2.20978 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72438 ms - Host latency: 2.87362 ms (end to end 2.88447 ms, enqueue 2.19883 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72251 ms - Host latency: 2.87129 ms (end to end 2.88204 ms, enqueue 2.21389 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72595 ms - Host latency: 2.87466 ms (end to end 2.8869 ms, enqueue 2.20353 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72393 ms - Host latency: 2.87378 ms (end to end 2.88608 ms, enqueue 2.20118 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72903 ms - Host latency: 2.87794 ms (end to end 2.88765 ms, enqueue 2.22356 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72593 ms - Host latency: 2.87532 ms (end to end 2.88752 ms, enqueue 2.2075 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72351 ms - Host latency: 2.87275 ms (end to end 2.88433 ms, enqueue 2.2041 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72571 ms - Host latency: 2.87483 ms (end to end 2.88606 ms, enqueue 2.24937 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72527 ms - Host latency: 2.87451 ms (end to end 2.88503 ms, enqueue 2.15012 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72883 ms - Host latency: 2.87827 ms (end to end 2.89048 ms, enqueue 2.15698 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72827 ms - Host latency: 2.87739 ms (end to end 2.88865 ms, enqueue 2.16128 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72861 ms - Host latency: 2.87764 ms (end to end 2.88801 ms, enqueue 2.16394 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.73086 ms - Host latency: 2.87974 ms (end to end 2.89001 ms, enqueue 2.13909 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72595 ms - Host latency: 2.87593 ms (end to end 2.88835 ms, enqueue 2.14246 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72605 ms - Host latency: 2.87549 ms (end to end 2.88892 ms, enqueue 2.14102 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72881 ms - Host latency: 2.87747 ms (end to end 2.88958 ms, enqueue 2.16865 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72617 ms - Host latency: 2.87554 ms (end to end 2.8874 ms, enqueue 2.14993 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72297 ms - Host latency: 2.87178 ms (end to end 2.88345 ms, enqueue 2.16941 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72729 ms - Host latency: 2.87646 ms (end to end 2.88779 ms, enqueue 2.16853 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72605 ms - Host latency: 2.87493 ms (end to end 2.88796 ms, enqueue 2.1469 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72817 ms - Host latency: 2.8772 ms (end to end 2.88977 ms, enqueue 2.15818 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72676 ms - Host latency: 2.87561 ms (end to end 2.88535 ms, enqueue 2.15586 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.729 ms - Host latency: 2.878 ms (end to end 2.8928 ms, enqueue 2.14558 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.7334 ms - Host latency: 2.8825 ms (end to end 2.89434 ms, enqueue 2.15833 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.73293 ms - Host latency: 2.88237 ms (end to end 2.89509 ms, enqueue 2.16743 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72717 ms - Host latency: 2.87607 ms (end to end 2.88772 ms, enqueue 2.13076 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72773 ms - Host latency: 2.87712 ms (end to end 2.88965 ms, enqueue 2.15789 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.7219 ms - Host latency: 2.87117 ms (end to end 2.88022 ms, enqueue 2.154 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72778 ms - Host latency: 2.87649 ms (end to end 2.88843 ms, enqueue 2.14954 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72512 ms - Host latency: 2.87446 ms (end to end 2.88699 ms, enqueue 2.13989 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72683 ms - Host latency: 2.87585 ms (end to end 2.88831 ms, enqueue 2.14424 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72576 ms - Host latency: 2.87451 ms (end to end 2.88564 ms, enqueue 2.12805 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72759 ms - Host latency: 2.87595 ms (end to end 2.88586 ms, enqueue 2.12693 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.73059 ms - Host latency: 2.87937 ms (end to end 2.88926 ms, enqueue 2.13843 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72612 ms - Host latency: 2.87515 ms (end to end 2.88621 ms, enqueue 2.13638 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72756 ms - Host latency: 2.87742 ms (end to end 2.88691 ms, enqueue 2.14333 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72332 ms - Host latency: 2.8719 ms (end to end 2.88059 ms, enqueue 2.19648 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72971 ms - Host latency: 2.8783 ms (end to end 2.88728 ms, enqueue 2.14587 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72881 ms - Host latency: 2.878 ms (end to end 2.8875 ms, enqueue 2.1511 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72346 ms - Host latency: 2.87261 ms (end to end 2.88379 ms, enqueue 2.13662 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72776 ms - Host latency: 2.87666 ms (end to end 2.88899 ms, enqueue 2.20024 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72832 ms - Host latency: 2.87734 ms (end to end 2.89214 ms, enqueue 2.16211 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.72888 ms - Host latency: 2.87791 ms (end to end 2.8876 ms, enqueue 2.14819 ms)
[09/24/2021-18:03:21] [I] Average on 10 runs - GPU latency: 2.73079 ms - Host latency: 2.87974 ms (end to end 2.88977 ms, enqueue 2.13779 ms)
[09/24/2021-18:03:21] [I] 
[09/24/2021-18:03:21] [I] === Performance summary ===
[09/24/2021-18:03:21] [I] Throughput: 346.861 qps
[09/24/2021-18:03:21] [I] Latency: min = 2.82928 ms, max = 2.96851 ms, mean = 2.8712 ms, median = 2.87195 ms, percentile(99%) = 2.89172 ms
[09/24/2021-18:03:21] [I] End-to-End Host Latency: min = 2.8381 ms, max = 2.97803 ms, mean = 2.88218 ms, median = 2.88318 ms, percentile(99%) = 2.90479 ms
[09/24/2021-18:03:21] [I] Enqueue Time: min = 2.05884 ms, max = 2.83978 ms, mean = 2.25478 ms, median = 2.23706 ms, percentile(99%) = 2.55811 ms
[09/24/2021-18:03:21] [I] H2D Latency: min = 0.114624 ms, max = 0.126953 ms, mean = 0.115959 ms, median = 0.115845 ms, percentile(99%) = 0.120117 ms
[09/24/2021-18:03:21] [I] GPU Compute Time: min = 2.68076 ms, max = 2.81812 ms, mean = 2.72213 ms, median = 2.7229 ms, percentile(99%) = 2.74243 ms
[09/24/2021-18:03:21] [I] D2H Latency: min = 0.03125 ms, max = 0.0352783 ms, mean = 0.0331086 ms, median = 0.0331421 ms, percentile(99%) = 0.0344238 ms
[09/24/2021-18:03:21] [I] Total Host Walltime: 3.00697 s
[09/24/2021-18:03:21] [I] Total GPU Compute Time: 2.83918 s
[09/24/2021-18:03:21] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[09/24/2021-18:03:21] [W]   If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[09/24/2021-18:03:21] [I] Explanations of the performance metrics are printed in the verbose logs.
[09/24/2021-18:03:21] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/acer/nfs-share/epoch_250.onnx --fp16
[09/24/2021-18:03:21] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1374, GPU 4998 (MiB)