&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/acer/nfs-share/epoch_15.onnx --int8
[09/24/2021-18:04:06] [I] === Model Options ===
[09/24/2021-18:04:06] [I] Format: ONNX
[09/24/2021-18:04:06] [I] Model: /home/acer/nfs-share/epoch_15.onnx
[09/24/2021-18:04:06] [I] Output:
[09/24/2021-18:04:06] [I] === Build Options ===
[09/24/2021-18:04:06] [I] Max batch: explicit
[09/24/2021-18:04:06] [I] Workspace: 16 MiB
[09/24/2021-18:04:06] [I] minTiming: 1
[09/24/2021-18:04:06] [I] avgTiming: 8
[09/24/2021-18:04:06] [I] Precision: FP32+INT8
[09/24/2021-18:04:06] [I] Calibration: Dynamic
[09/24/2021-18:04:06] [I] Refit: Disabled
[09/24/2021-18:04:06] [I] Sparsity: Disabled
[09/24/2021-18:04:06] [I] Safe mode: Disabled
[09/24/2021-18:04:06] [I] Restricted mode: Disabled
[09/24/2021-18:04:06] [I] Save engine: 
[09/24/2021-18:04:06] [I] Load engine: 
[09/24/2021-18:04:06] [I] NVTX verbosity: 0
[09/24/2021-18:04:06] [I] Tactic sources: Using default tactic sources
[09/24/2021-18:04:06] [I] timingCacheMode: local
[09/24/2021-18:04:06] [I] timingCacheFile: 
[09/24/2021-18:04:06] [I] Input(s)s format: fp32:CHW
[09/24/2021-18:04:06] [I] Output(s)s format: fp32:CHW
[09/24/2021-18:04:06] [I] Input build shapes: model
[09/24/2021-18:04:06] [I] Input calibration shapes: model
[09/24/2021-18:04:06] [I] === System Options ===
[09/24/2021-18:04:06] [I] Device: 0
[09/24/2021-18:04:06] [I] DLACore: 
[09/24/2021-18:04:06] [I] Plugins:
[09/24/2021-18:04:06] [I] === Inference Options ===
[09/24/2021-18:04:06] [I] Batch: Explicit
[09/24/2021-18:04:06] [I] Input inference shapes: model
[09/24/2021-18:04:06] [I] Iterations: 10
[09/24/2021-18:04:06] [I] Duration: 3s (+ 200ms warm up)
[09/24/2021-18:04:06] [I] Sleep time: 0ms
[09/24/2021-18:04:06] [I] Streams: 1
[09/24/2021-18:04:06] [I] ExposeDMA: Disabled
[09/24/2021-18:04:06] [I] Data transfers: Enabled
[09/24/2021-18:04:06] [I] Spin-wait: Disabled
[09/24/2021-18:04:06] [I] Multithreading: Disabled
[09/24/2021-18:04:06] [I] CUDA Graph: Disabled
[09/24/2021-18:04:06] [I] Separate profiling: Disabled
[09/24/2021-18:04:06] [I] Time Deserialize: Disabled
[09/24/2021-18:04:06] [I] Time Refit: Disabled
[09/24/2021-18:04:06] [I] Skip inference: Disabled
[09/24/2021-18:04:06] [I] Inputs:
[09/24/2021-18:04:06] [I] === Reporting Options ===
[09/24/2021-18:04:06] [I] Verbose: Disabled
[09/24/2021-18:04:06] [I] Averages: 10 inferences
[09/24/2021-18:04:06] [I] Percentile: 99
[09/24/2021-18:04:06] [I] Dump refittable layers:Disabled
[09/24/2021-18:04:06] [I] Dump output: Disabled
[09/24/2021-18:04:06] [I] Profile: Disabled
[09/24/2021-18:04:06] [I] Export timing to JSON file: 
[09/24/2021-18:04:06] [I] Export output to JSON file: 
[09/24/2021-18:04:06] [I] Export profile to JSON file: 
[09/24/2021-18:04:06] [I] 
[09/24/2021-18:04:06] [I] === Device Information ===
[09/24/2021-18:04:06] [I] Selected Device: Xavier
[09/24/2021-18:04:06] [I] Compute Capability: 7.2
[09/24/2021-18:04:06] [I] SMs: 6
[09/24/2021-18:04:06] [I] Compute Clock Rate: 1.109 GHz
[09/24/2021-18:04:06] [I] Device Global Memory: 7773 MiB
[09/24/2021-18:04:06] [I] Shared Memory per SM: 96 KiB
[09/24/2021-18:04:06] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/24/2021-18:04:06] [I] Memory Clock Rate: 1.109 GHz
[09/24/2021-18:04:06] [I] 
[09/24/2021-18:04:06] [I] TensorRT version: 8001
[09/24/2021-18:04:07] [I] [TRT] [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 371, GPU 3738 (MiB)
[09/24/2021-18:04:07] [I] Start parsing network model
[09/24/2021-18:04:07] [I] [TRT] ----------------------------------------------------------------
[09/24/2021-18:04:07] [I] [TRT] Input filename:   /home/acer/nfs-share/epoch_15.onnx
[09/24/2021-18:04:07] [I] [TRT] ONNX IR version:  0.0.6
[09/24/2021-18:04:07] [I] [TRT] Opset version:    13
[09/24/2021-18:04:07] [I] [TRT] Producer name:    pytorch
[09/24/2021-18:04:07] [I] [TRT] Producer version: 1.8
[09/24/2021-18:04:07] [I] [TRT] Domain:           
[09/24/2021-18:04:07] [I] [TRT] Model version:    0
[09/24/2021-18:04:07] [I] [TRT] Doc string:       
[09/24/2021-18:04:07] [I] [TRT] ----------------------------------------------------------------
[09/24/2021-18:04:07] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/24/2021-18:04:09] [I] Finish parsing network model
[09/24/2021-18:04:09] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 376, GPU 3747 (MiB)
[09/24/2021-18:04:09] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[09/24/2021-18:04:09] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 376 MiB, GPU 3747 MiB
[09/24/2021-18:04:09] [W] [TRT] Calibrator won't be used in explicit precision mode. Use quantization aware training to generate network with Quantize/Dequantize nodes.
[09/24/2021-18:04:09] [I] [TRT] ---------- Layers Running on DLA ----------
[09/24/2021-18:04:09] [I] [TRT] ---------- Layers Running on GPU ----------
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] QuantizeLinear_2_quantize_scale_node
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.0.0.weight + QuantizeLinear_7_quantize_scale_node + Conv_9
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_11
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.1.0.weight + QuantizeLinear_19_quantize_scale_node + Conv_21
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_23
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.1.3.weight + QuantizeLinear_31_quantize_scale_node + Conv_33
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_35
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.2.0.weight + QuantizeLinear_43_quantize_scale_node + Conv_45
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_47
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.2.3.weight + QuantizeLinear_55_quantize_scale_node + Conv_57
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_59
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.3.0.weight + QuantizeLinear_67_quantize_scale_node + Conv_69
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_71
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.3.3.weight + QuantizeLinear_79_quantize_scale_node + Conv_81
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_83
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.4.0.weight + QuantizeLinear_91_quantize_scale_node + Conv_93
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_95
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.4.3.weight + QuantizeLinear_103_quantize_scale_node + Conv_105
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_107
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.5.0.weight + QuantizeLinear_115_quantize_scale_node + Conv_117
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_119
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.5.3.weight + QuantizeLinear_127_quantize_scale_node + Conv_129
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_131
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.0.0.weight + QuantizeLinear_139_quantize_scale_node + Conv_141
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] fpn.output1.0.weight + QuantizeLinear_331_quantize_scale_node + Conv_333
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_143
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.0.3.weight + QuantizeLinear_151_quantize_scale_node + Conv_153
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_155
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.1.0.weight + QuantizeLinear_163_quantize_scale_node + Conv_165
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_167
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.1.3.weight + QuantizeLinear_175_quantize_scale_node + Conv_177
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_179
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.2.0.weight + QuantizeLinear_187_quantize_scale_node + Conv_189
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_191
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.2.3.weight + QuantizeLinear_199_quantize_scale_node + Conv_201
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_203
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.3.0.weight + QuantizeLinear_211_quantize_scale_node + Conv_213
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_215
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.3.3.weight + QuantizeLinear_223_quantize_scale_node + Conv_225
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_227
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.4.0.weight + QuantizeLinear_235_quantize_scale_node + Conv_237
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_239
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.4.3.weight + QuantizeLinear_247_quantize_scale_node + Conv_249
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_251
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.5.0.weight + QuantizeLinear_259_quantize_scale_node + Conv_261
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_263
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.5.3.weight + QuantizeLinear_271_quantize_scale_node + Conv_273
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_275
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage3.0.0.weight + QuantizeLinear_283_quantize_scale_node + Conv_285
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] fpn.output2.0.weight + QuantizeLinear_343_quantize_scale_node + Conv_345
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_287
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage3.0.3.weight + QuantizeLinear_295_quantize_scale_node + Conv_297
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_299
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage3.1.0.weight + QuantizeLinear_307_quantize_scale_node + Conv_309
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_311
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage3.1.3.weight + QuantizeLinear_319_quantize_scale_node + Conv_321
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_323
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] fpn.output3.0.weight + QuantizeLinear_355_quantize_scale_node + Conv_357
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_359
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] QuantizeLinear_544_quantize_scale_node
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh3.conv3X3.0.weight + QuantizeLinear_549_quantize_scale_node + Conv_551
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh3.conv5X5_1.0.weight + QuantizeLinear_560_quantize_scale_node + Conv_562
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Resize_378
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] PWN(LeakyRelu_347, Add_379)
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_564
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] fpn.merge2.0.weight + QuantizeLinear_387_quantize_scale_node + Conv_389
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh3.conv5X5_2.0.weight + QuantizeLinear_572_quantize_scale_node + Conv_574
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh3.conv7X7_2.0.weight + QuantizeLinear_583_quantize_scale_node + Conv_585
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_391
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_587
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] QuantizeLinear_485_quantize_scale_node
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh2.conv3X3.0.weight + QuantizeLinear_490_quantize_scale_node + Conv_492
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh2.conv5X5_1.0.weight + QuantizeLinear_501_quantize_scale_node + Conv_503
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh3.conv7x7_3.0.weight + QuantizeLinear_595_quantize_scale_node + Conv_597
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Resize_410
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] PWN(LeakyRelu_335, Add_411)
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_505
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] PWN(Relu_600)
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] fpn.merge1.0.weight + QuantizeLinear_419_quantize_scale_node + Conv_421
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh2.conv5X5_2.0.weight + QuantizeLinear_513_quantize_scale_node + Conv_515
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh2.conv7X7_2.0.weight + QuantizeLinear_524_quantize_scale_node + Conv_526
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] BboxHead.2.conv1x1.weight + QuantizeLinear_644_quantize_scale_node + Conv_646
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ClassHead.2.conv1x1.weight + QuantizeLinear_699_quantize_scale_node + Conv_701
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LandmarkHead.2.conv1x1.weight + QuantizeLinear_754_quantize_scale_node + Conv_756
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_423
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_528
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh1.conv3X3.0.weight + QuantizeLinear_431_quantize_scale_node + Conv_433
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh1.conv5X5_1.0.weight + QuantizeLinear_442_quantize_scale_node + Conv_444
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh2.conv7x7_3.0.weight + QuantizeLinear_536_quantize_scale_node + Conv_538
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_446
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_647 + Reshape_654
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_702 + Reshape_709
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_757 + Reshape_764
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] PWN(Relu_541)
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh1.conv5X5_2.0.weight + QuantizeLinear_454_quantize_scale_node + Conv_456
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh1.conv7X7_2.0.weight + QuantizeLinear_465_quantize_scale_node + Conv_467
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] BboxHead.1.conv1x1.weight + QuantizeLinear_626_quantize_scale_node + Conv_628
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ClassHead.1.conv1x1.weight + QuantizeLinear_681_quantize_scale_node + Conv_683
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LandmarkHead.1.conv1x1.weight + QuantizeLinear_736_quantize_scale_node + Conv_738
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_469
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh1.conv7x7_3.0.weight + QuantizeLinear_477_quantize_scale_node + Conv_479
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_629 + Reshape_636
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_684 + Reshape_691
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_739 + Reshape_746
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] PWN(Relu_482)
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] BboxHead.0.conv1x1.weight + QuantizeLinear_608_quantize_scale_node + Conv_610
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ClassHead.0.conv1x1.weight + QuantizeLinear_663_quantize_scale_node + Conv_665
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LandmarkHead.0.conv1x1.weight + QuantizeLinear_718_quantize_scale_node + Conv_720
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_611 + Reshape_618
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_666 + Reshape_673
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_721 + Reshape_728
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1128 copy
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1154 copy
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1180 copy
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1207 copy
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1233 copy
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1259 copy
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1286 copy
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1312 copy
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1338 copy
[09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Softmax_766
[09/24/2021-18:04:10] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +227, GPU +230, now: CPU 605, GPU 3977 (MiB)
[09/24/2021-18:04:11] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +391, now: CPU 912, GPU 4368 (MiB)
[09/24/2021-18:04:11] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[09/24/2021-18:07:25] [I] [TRT] Detected 1 inputs and 9 output network tensors.
[09/24/2021-18:07:25] [I] [TRT] Total Host Persistent Memory: 115600
[09/24/2021-18:07:25] [I] [TRT] Total Device Persistent Memory: 1679872
[09/24/2021-18:07:25] [I] [TRT] Total Scratch Memory: 0
[09/24/2021-18:07:25] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 2 MiB, GPU 18 MiB
[09/24/2021-18:07:25] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 933, GPU 4507 (MiB)
[09/24/2021-18:07:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 934, GPU 4507 (MiB)
[09/24/2021-18:07:25] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 933, GPU 4507 (MiB)
[09/24/2021-18:07:25] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 933, GPU 4507 (MiB)
[09/24/2021-18:07:25] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 931 MiB, GPU 4507 MiB
[09/24/2021-18:07:26] [I] [TRT] Loaded engine size: 4 MB
[09/24/2021-18:07:26] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 925 MiB, GPU 4507 MiB
[09/24/2021-18:07:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +0, now: CPU 931, GPU 4507 (MiB)
[09/24/2021-18:07:26] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 931, GPU 4507 (MiB)
[09/24/2021-18:07:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 931, GPU 4507 (MiB)
[09/24/2021-18:07:26] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 931 MiB, GPU 4507 MiB
[09/24/2021-18:07:26] [I] Engine built in 199.796 sec.
[09/24/2021-18:07:26] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 926 MiB, GPU 4507 MiB
[09/24/2021-18:07:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 926, GPU 4507 (MiB)
[09/24/2021-18:07:26] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 926, GPU 4507 (MiB)
[09/24/2021-18:07:26] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 926 MiB, GPU 4507 MiB
[09/24/2021-18:07:26] [I] Created input binding for inputs.1 with dimensions 1x3x640x352
[09/24/2021-18:07:26] [I] Created output binding for 1181 with dimensions 1x9240x4
[09/24/2021-18:07:26] [I] Created output binding for 1339 with dimensions 1x9240x10
[09/24/2021-18:07:26] [I] Created output binding for 1340 with dimensions 1x9240x2
[09/24/2021-18:07:26] [I] Starting inference
[09/24/2021-18:07:29] [I] Warmup completed 43 queries over 200 ms
[09/24/2021-18:07:29] [I] Timing trace has 656 queries over 3.01036 s
[09/24/2021-18:07:29] [I] 
[09/24/2021-18:07:29] [I] === Trace details ===
[09/24/2021-18:07:29] [I] Trace averages of 10 runs:
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42458 ms - Host latency: 4.57342 ms (end to end 4.58341 ms, enqueue 2.49678 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42445 ms - Host latency: 4.57319 ms (end to end 4.58299 ms, enqueue 2.44905 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42982 ms - Host latency: 4.57862 ms (end to end 4.58836 ms, enqueue 2.44425 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43179 ms - Host latency: 4.58066 ms (end to end 4.59049 ms, enqueue 2.37694 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.41842 ms - Host latency: 4.56768 ms (end to end 4.57617 ms, enqueue 2.43473 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4274 ms - Host latency: 4.57666 ms (end to end 4.58802 ms, enqueue 2.40548 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43199 ms - Host latency: 4.58074 ms (end to end 4.59091 ms, enqueue 2.37454 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42364 ms - Host latency: 4.57243 ms (end to end 4.58308 ms, enqueue 2.37195 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42254 ms - Host latency: 4.57175 ms (end to end 4.58223 ms, enqueue 2.35853 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42642 ms - Host latency: 4.57553 ms (end to end 4.58604 ms, enqueue 2.38159 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43264 ms - Host latency: 4.58171 ms (end to end 4.59057 ms, enqueue 2.37614 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43371 ms - Host latency: 4.58275 ms (end to end 4.59415 ms, enqueue 2.38123 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42706 ms - Host latency: 4.57629 ms (end to end 4.58818 ms, enqueue 2.3795 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4314 ms - Host latency: 4.58049 ms (end to end 4.5897 ms, enqueue 2.36934 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43033 ms - Host latency: 4.57953 ms (end to end 4.58964 ms, enqueue 2.34334 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42205 ms - Host latency: 4.57159 ms (end to end 4.58318 ms, enqueue 2.35987 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42767 ms - Host latency: 4.57712 ms (end to end 4.58633 ms, enqueue 2.36812 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42989 ms - Host latency: 4.57862 ms (end to end 4.58757 ms, enqueue 2.36015 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42661 ms - Host latency: 4.57537 ms (end to end 4.58437 ms, enqueue 2.34145 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42021 ms - Host latency: 4.56912 ms (end to end 4.57958 ms, enqueue 2.32709 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4288 ms - Host latency: 4.57828 ms (end to end 4.59021 ms, enqueue 2.36727 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43164 ms - Host latency: 4.58091 ms (end to end 4.59008 ms, enqueue 2.33909 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4317 ms - Host latency: 4.58011 ms (end to end 4.59044 ms, enqueue 2.34871 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42926 ms - Host latency: 4.57811 ms (end to end 4.5891 ms, enqueue 2.35013 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42932 ms - Host latency: 4.57816 ms (end to end 4.59026 ms, enqueue 2.33152 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42432 ms - Host latency: 4.57321 ms (end to end 4.58297 ms, enqueue 2.34402 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.41949 ms - Host latency: 4.56859 ms (end to end 4.57942 ms, enqueue 2.36777 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42769 ms - Host latency: 4.57688 ms (end to end 4.58684 ms, enqueue 2.32573 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42804 ms - Host latency: 4.57653 ms (end to end 4.58666 ms, enqueue 2.35432 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42909 ms - Host latency: 4.57881 ms (end to end 4.58712 ms, enqueue 2.34409 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43446 ms - Host latency: 4.58367 ms (end to end 4.5947 ms, enqueue 2.33566 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43483 ms - Host latency: 4.58311 ms (end to end 4.59275 ms, enqueue 2.32732 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42899 ms - Host latency: 4.57767 ms (end to end 4.58727 ms, enqueue 2.36718 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43308 ms - Host latency: 4.58248 ms (end to end 4.59464 ms, enqueue 2.32919 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42068 ms - Host latency: 4.56926 ms (end to end 4.58152 ms, enqueue 2.35371 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42954 ms - Host latency: 4.57822 ms (end to end 4.58896 ms, enqueue 2.32939 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42997 ms - Host latency: 4.57881 ms (end to end 4.59103 ms, enqueue 2.33153 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43575 ms - Host latency: 4.58513 ms (end to end 4.59782 ms, enqueue 2.32601 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42899 ms - Host latency: 4.5778 ms (end to end 4.59023 ms, enqueue 2.33546 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43223 ms - Host latency: 4.58132 ms (end to end 4.5939 ms, enqueue 2.33451 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42465 ms - Host latency: 4.57353 ms (end to end 4.58492 ms, enqueue 2.34061 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42732 ms - Host latency: 4.57659 ms (end to end 4.58796 ms, enqueue 2.3134 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4325 ms - Host latency: 4.58147 ms (end to end 4.59214 ms, enqueue 2.32388 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43293 ms - Host latency: 4.58176 ms (end to end 4.59224 ms, enqueue 2.33313 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43328 ms - Host latency: 4.58223 ms (end to end 4.59346 ms, enqueue 2.32554 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43113 ms - Host latency: 4.58047 ms (end to end 4.59277 ms, enqueue 2.33757 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43455 ms - Host latency: 4.58372 ms (end to end 4.59287 ms, enqueue 2.36929 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42737 ms - Host latency: 4.57642 ms (end to end 4.58772 ms, enqueue 2.33396 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42517 ms - Host latency: 4.57434 ms (end to end 4.58381 ms, enqueue 2.36916 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43267 ms - Host latency: 4.58123 ms (end to end 4.59226 ms, enqueue 2.38721 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42898 ms - Host latency: 4.57871 ms (end to end 4.58706 ms, enqueue 2.40481 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4322 ms - Host latency: 4.58076 ms (end to end 4.5915 ms, enqueue 2.38479 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42971 ms - Host latency: 4.57869 ms (end to end 4.58867 ms, enqueue 2.35176 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42729 ms - Host latency: 4.57593 ms (end to end 4.58616 ms, enqueue 2.3887 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42209 ms - Host latency: 4.57151 ms (end to end 4.58193 ms, enqueue 2.38823 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42397 ms - Host latency: 4.57271 ms (end to end 4.58149 ms, enqueue 2.37244 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.41929 ms - Host latency: 4.56826 ms (end to end 4.57866 ms, enqueue 2.39209 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4283 ms - Host latency: 4.57686 ms (end to end 4.58684 ms, enqueue 2.38665 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43154 ms - Host latency: 4.58074 ms (end to end 4.59014 ms, enqueue 2.34788 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43311 ms - Host latency: 4.58188 ms (end to end 4.59343 ms, enqueue 2.3259 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43086 ms - Host latency: 4.57993 ms (end to end 4.58945 ms, enqueue 2.34026 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42249 ms - Host latency: 4.57129 ms (end to end 4.58171 ms, enqueue 2.32581 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43198 ms - Host latency: 4.58069 ms (end to end 4.59089 ms, enqueue 2.33267 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43193 ms - Host latency: 4.58081 ms (end to end 4.59148 ms, enqueue 2.32795 ms)
[09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42749 ms - Host latency: 4.57637 ms (end to end 4.58777 ms, enqueue 2.35505 ms)
[09/24/2021-18:07:29] [I] 
[09/24/2021-18:07:29] [I] === Performance summary ===
[09/24/2021-18:07:29] [I] Throughput: 217.914 qps
[09/24/2021-18:07:29] [I] Latency: min = 4.54761 ms, max = 4.63867 ms, mean = 4.57764 ms, median = 4.57803 ms, percentile(99%) = 4.59985 ms
[09/24/2021-18:07:29] [I] End-to-End Host Latency: min = 4.55884 ms, max = 4.65112 ms, mean = 4.58815 ms, median = 4.58826 ms, percentile(99%) = 4.61398 ms
[09/24/2021-18:07:29] [I] Enqueue Time: min = 2.22827 ms, max = 2.61877 ms, mean = 2.35987 ms, median = 2.35272 ms, percentile(99%) = 2.57306 ms
[09/24/2021-18:07:29] [I] H2D Latency: min = 0.114258 ms, max = 0.126953 ms, mean = 0.115495 ms, median = 0.115479 ms, percentile(99%) = 0.117432 ms
[09/24/2021-18:07:29] [I] GPU Compute Time: min = 4.39941 ms, max = 4.49048 ms, mean = 4.42866 ms, median = 4.42896 ms, percentile(99%) = 4.45032 ms
[09/24/2021-18:07:29] [I] D2H Latency: min = 0.0314941 ms, max = 0.0356445 ms, mean = 0.0334863 ms, median = 0.0334473 ms, percentile(99%) = 0.0351562 ms
[09/24/2021-18:07:29] [I] Total Host Walltime: 3.01036 s
[09/24/2021-18:07:29] [I] Total GPU Compute Time: 2.9052 s
[09/24/2021-18:07:29] [I] Explanations of the performance metrics are printed in the verbose logs.
[09/24/2021-18:07:29] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/acer/nfs-share/epoch_15.onnx --int8
[09/24/2021-18:07:29] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 926, GPU 4509 (MiB)