&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/acer/nfs-share/epoch_15.onnx --int8 [09/24/2021-18:04:06] [I] === Model Options === [09/24/2021-18:04:06] [I] Format: ONNX [09/24/2021-18:04:06] [I] Model: /home/acer/nfs-share/epoch_15.onnx [09/24/2021-18:04:06] [I] Output: [09/24/2021-18:04:06] [I] === Build Options === [09/24/2021-18:04:06] [I] Max batch: explicit [09/24/2021-18:04:06] [I] Workspace: 16 MiB [09/24/2021-18:04:06] [I] minTiming: 1 [09/24/2021-18:04:06] [I] avgTiming: 8 [09/24/2021-18:04:06] [I] Precision: FP32+INT8 [09/24/2021-18:04:06] [I] Calibration: Dynamic [09/24/2021-18:04:06] [I] Refit: Disabled [09/24/2021-18:04:06] [I] Sparsity: Disabled [09/24/2021-18:04:06] [I] Safe mode: Disabled [09/24/2021-18:04:06] [I] Restricted mode: Disabled [09/24/2021-18:04:06] [I] Save engine: [09/24/2021-18:04:06] [I] Load engine: [09/24/2021-18:04:06] [I] NVTX verbosity: 0 [09/24/2021-18:04:06] [I] Tactic sources: Using default tactic sources [09/24/2021-18:04:06] [I] timingCacheMode: local [09/24/2021-18:04:06] [I] timingCacheFile: [09/24/2021-18:04:06] [I] Input(s)s format: fp32:CHW [09/24/2021-18:04:06] [I] Output(s)s format: fp32:CHW [09/24/2021-18:04:06] [I] Input build shapes: model [09/24/2021-18:04:06] [I] Input calibration shapes: model [09/24/2021-18:04:06] [I] === System Options === [09/24/2021-18:04:06] [I] Device: 0 [09/24/2021-18:04:06] [I] DLACore: [09/24/2021-18:04:06] [I] Plugins: [09/24/2021-18:04:06] [I] === Inference Options === [09/24/2021-18:04:06] [I] Batch: Explicit [09/24/2021-18:04:06] [I] Input inference shapes: model [09/24/2021-18:04:06] [I] Iterations: 10 [09/24/2021-18:04:06] [I] Duration: 3s (+ 200ms warm up) [09/24/2021-18:04:06] [I] Sleep time: 0ms [09/24/2021-18:04:06] [I] Streams: 1 [09/24/2021-18:04:06] [I] ExposeDMA: Disabled [09/24/2021-18:04:06] [I] Data transfers: Enabled [09/24/2021-18:04:06] [I] Spin-wait: Disabled [09/24/2021-18:04:06] [I] Multithreading: Disabled [09/24/2021-18:04:06] [I] CUDA Graph: Disabled [09/24/2021-18:04:06] [I] Separate profiling: Disabled [09/24/2021-18:04:06] [I] Time Deserialize: Disabled [09/24/2021-18:04:06] [I] Time Refit: Disabled [09/24/2021-18:04:06] [I] Skip inference: Disabled [09/24/2021-18:04:06] [I] Inputs: [09/24/2021-18:04:06] [I] === Reporting Options === [09/24/2021-18:04:06] [I] Verbose: Disabled [09/24/2021-18:04:06] [I] Averages: 10 inferences [09/24/2021-18:04:06] [I] Percentile: 99 [09/24/2021-18:04:06] [I] Dump refittable layers:Disabled [09/24/2021-18:04:06] [I] Dump output: Disabled [09/24/2021-18:04:06] [I] Profile: Disabled [09/24/2021-18:04:06] [I] Export timing to JSON file: [09/24/2021-18:04:06] [I] Export output to JSON file: [09/24/2021-18:04:06] [I] Export profile to JSON file: [09/24/2021-18:04:06] [I] [09/24/2021-18:04:06] [I] === Device Information === [09/24/2021-18:04:06] [I] Selected Device: Xavier [09/24/2021-18:04:06] [I] Compute Capability: 7.2 [09/24/2021-18:04:06] [I] SMs: 6 [09/24/2021-18:04:06] [I] Compute Clock Rate: 1.109 GHz [09/24/2021-18:04:06] [I] Device Global Memory: 7773 MiB [09/24/2021-18:04:06] [I] Shared Memory per SM: 96 KiB [09/24/2021-18:04:06] [I] Memory Bus Width: 256 bits (ECC disabled) [09/24/2021-18:04:06] [I] Memory Clock Rate: 1.109 GHz [09/24/2021-18:04:06] [I] [09/24/2021-18:04:06] [I] TensorRT version: 8001 [09/24/2021-18:04:07] [I] [TRT] [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 371, GPU 3738 (MiB) [09/24/2021-18:04:07] [I] Start parsing network model [09/24/2021-18:04:07] [I] [TRT] ---------------------------------------------------------------- [09/24/2021-18:04:07] [I] [TRT] Input filename: /home/acer/nfs-share/epoch_15.onnx [09/24/2021-18:04:07] [I] [TRT] ONNX IR version: 0.0.6 [09/24/2021-18:04:07] [I] [TRT] Opset version: 13 [09/24/2021-18:04:07] [I] [TRT] Producer name: pytorch [09/24/2021-18:04:07] [I] [TRT] Producer version: 1.8 [09/24/2021-18:04:07] [I] [TRT] Domain: [09/24/2021-18:04:07] [I] [TRT] Model version: 0 [09/24/2021-18:04:07] [I] [TRT] Doc string: [09/24/2021-18:04:07] [I] [TRT] ---------------------------------------------------------------- [09/24/2021-18:04:07] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [09/24/2021-18:04:09] [I] Finish parsing network model [09/24/2021-18:04:09] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 376, GPU 3747 (MiB) [09/24/2021-18:04:09] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best [09/24/2021-18:04:09] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 376 MiB, GPU 3747 MiB [09/24/2021-18:04:09] [W] [TRT] Calibrator won't be used in explicit precision mode. Use quantization aware training to generate network with Quantize/Dequantize nodes. [09/24/2021-18:04:09] [I] [TRT] ---------- Layers Running on DLA ---------- [09/24/2021-18:04:09] [I] [TRT] ---------- Layers Running on GPU ---------- [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] QuantizeLinear_2_quantize_scale_node [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.0.0.weight + QuantizeLinear_7_quantize_scale_node + Conv_9 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_11 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.1.0.weight + QuantizeLinear_19_quantize_scale_node + Conv_21 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_23 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.1.3.weight + QuantizeLinear_31_quantize_scale_node + Conv_33 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_35 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.2.0.weight + QuantizeLinear_43_quantize_scale_node + Conv_45 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_47 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.2.3.weight + QuantizeLinear_55_quantize_scale_node + Conv_57 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_59 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.3.0.weight + QuantizeLinear_67_quantize_scale_node + Conv_69 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_71 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.3.3.weight + QuantizeLinear_79_quantize_scale_node + Conv_81 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_83 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.4.0.weight + QuantizeLinear_91_quantize_scale_node + Conv_93 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_95 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.4.3.weight + QuantizeLinear_103_quantize_scale_node + Conv_105 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_107 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.5.0.weight + QuantizeLinear_115_quantize_scale_node + Conv_117 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_119 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage1.5.3.weight + QuantizeLinear_127_quantize_scale_node + Conv_129 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_131 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.0.0.weight + QuantizeLinear_139_quantize_scale_node + Conv_141 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] fpn.output1.0.weight + QuantizeLinear_331_quantize_scale_node + Conv_333 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_143 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.0.3.weight + QuantizeLinear_151_quantize_scale_node + Conv_153 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_155 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.1.0.weight + QuantizeLinear_163_quantize_scale_node + Conv_165 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_167 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.1.3.weight + QuantizeLinear_175_quantize_scale_node + Conv_177 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_179 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.2.0.weight + QuantizeLinear_187_quantize_scale_node + Conv_189 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_191 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.2.3.weight + QuantizeLinear_199_quantize_scale_node + Conv_201 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_203 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.3.0.weight + QuantizeLinear_211_quantize_scale_node + Conv_213 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_215 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.3.3.weight + QuantizeLinear_223_quantize_scale_node + Conv_225 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_227 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.4.0.weight + QuantizeLinear_235_quantize_scale_node + Conv_237 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_239 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.4.3.weight + QuantizeLinear_247_quantize_scale_node + Conv_249 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_251 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.5.0.weight + QuantizeLinear_259_quantize_scale_node + Conv_261 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_263 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage2.5.3.weight + QuantizeLinear_271_quantize_scale_node + Conv_273 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_275 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage3.0.0.weight + QuantizeLinear_283_quantize_scale_node + Conv_285 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] fpn.output2.0.weight + QuantizeLinear_343_quantize_scale_node + Conv_345 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_287 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage3.0.3.weight + QuantizeLinear_295_quantize_scale_node + Conv_297 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_299 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage3.1.0.weight + QuantizeLinear_307_quantize_scale_node + Conv_309 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_311 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] body.stage3.1.3.weight + QuantizeLinear_319_quantize_scale_node + Conv_321 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_323 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] fpn.output3.0.weight + QuantizeLinear_355_quantize_scale_node + Conv_357 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_359 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] QuantizeLinear_544_quantize_scale_node [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh3.conv3X3.0.weight + QuantizeLinear_549_quantize_scale_node + Conv_551 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh3.conv5X5_1.0.weight + QuantizeLinear_560_quantize_scale_node + Conv_562 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Resize_378 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] PWN(LeakyRelu_347, Add_379) [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_564 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] fpn.merge2.0.weight + QuantizeLinear_387_quantize_scale_node + Conv_389 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh3.conv5X5_2.0.weight + QuantizeLinear_572_quantize_scale_node + Conv_574 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh3.conv7X7_2.0.weight + QuantizeLinear_583_quantize_scale_node + Conv_585 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_391 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_587 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] QuantizeLinear_485_quantize_scale_node [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh2.conv3X3.0.weight + QuantizeLinear_490_quantize_scale_node + Conv_492 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh2.conv5X5_1.0.weight + QuantizeLinear_501_quantize_scale_node + Conv_503 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh3.conv7x7_3.0.weight + QuantizeLinear_595_quantize_scale_node + Conv_597 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Resize_410 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] PWN(LeakyRelu_335, Add_411) [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_505 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] PWN(Relu_600) [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] fpn.merge1.0.weight + QuantizeLinear_419_quantize_scale_node + Conv_421 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh2.conv5X5_2.0.weight + QuantizeLinear_513_quantize_scale_node + Conv_515 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh2.conv7X7_2.0.weight + QuantizeLinear_524_quantize_scale_node + Conv_526 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] BboxHead.2.conv1x1.weight + QuantizeLinear_644_quantize_scale_node + Conv_646 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ClassHead.2.conv1x1.weight + QuantizeLinear_699_quantize_scale_node + Conv_701 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LandmarkHead.2.conv1x1.weight + QuantizeLinear_754_quantize_scale_node + Conv_756 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_423 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_528 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh1.conv3X3.0.weight + QuantizeLinear_431_quantize_scale_node + Conv_433 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh1.conv5X5_1.0.weight + QuantizeLinear_442_quantize_scale_node + Conv_444 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh2.conv7x7_3.0.weight + QuantizeLinear_536_quantize_scale_node + Conv_538 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_446 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_647 + Reshape_654 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_702 + Reshape_709 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_757 + Reshape_764 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] PWN(Relu_541) [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh1.conv5X5_2.0.weight + QuantizeLinear_454_quantize_scale_node + Conv_456 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh1.conv7X7_2.0.weight + QuantizeLinear_465_quantize_scale_node + Conv_467 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] BboxHead.1.conv1x1.weight + QuantizeLinear_626_quantize_scale_node + Conv_628 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ClassHead.1.conv1x1.weight + QuantizeLinear_681_quantize_scale_node + Conv_683 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LandmarkHead.1.conv1x1.weight + QuantizeLinear_736_quantize_scale_node + Conv_738 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LeakyRelu_469 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ssh1.conv7x7_3.0.weight + QuantizeLinear_477_quantize_scale_node + Conv_479 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_629 + Reshape_636 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_684 + Reshape_691 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_739 + Reshape_746 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] PWN(Relu_482) [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] BboxHead.0.conv1x1.weight + QuantizeLinear_608_quantize_scale_node + Conv_610 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] ClassHead.0.conv1x1.weight + QuantizeLinear_663_quantize_scale_node + Conv_665 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] LandmarkHead.0.conv1x1.weight + QuantizeLinear_718_quantize_scale_node + Conv_720 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_611 + Reshape_618 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_666 + Reshape_673 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Transpose_721 + Reshape_728 [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1128 copy [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1154 copy [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1180 copy [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1207 copy [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1233 copy [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1259 copy [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1286 copy [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1312 copy [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] 1338 copy [09/24/2021-18:04:09] [I] [TRT] [GpuLayer] Softmax_766 [09/24/2021-18:04:10] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +227, GPU +230, now: CPU 605, GPU 3977 (MiB) [09/24/2021-18:04:11] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +391, now: CPU 912, GPU 4368 (MiB) [09/24/2021-18:04:11] [W] [TRT] Detected invalid timing cache, setup a local cache instead [09/24/2021-18:07:25] [I] [TRT] Detected 1 inputs and 9 output network tensors. [09/24/2021-18:07:25] [I] [TRT] Total Host Persistent Memory: 115600 [09/24/2021-18:07:25] [I] [TRT] Total Device Persistent Memory: 1679872 [09/24/2021-18:07:25] [I] [TRT] Total Scratch Memory: 0 [09/24/2021-18:07:25] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 2 MiB, GPU 18 MiB [09/24/2021-18:07:25] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 933, GPU 4507 (MiB) [09/24/2021-18:07:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 934, GPU 4507 (MiB) [09/24/2021-18:07:25] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 933, GPU 4507 (MiB) [09/24/2021-18:07:25] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 933, GPU 4507 (MiB) [09/24/2021-18:07:25] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 931 MiB, GPU 4507 MiB [09/24/2021-18:07:26] [I] [TRT] Loaded engine size: 4 MB [09/24/2021-18:07:26] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 925 MiB, GPU 4507 MiB [09/24/2021-18:07:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +0, now: CPU 931, GPU 4507 (MiB) [09/24/2021-18:07:26] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 931, GPU 4507 (MiB) [09/24/2021-18:07:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 931, GPU 4507 (MiB) [09/24/2021-18:07:26] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 931 MiB, GPU 4507 MiB [09/24/2021-18:07:26] [I] Engine built in 199.796 sec. [09/24/2021-18:07:26] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 926 MiB, GPU 4507 MiB [09/24/2021-18:07:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 926, GPU 4507 (MiB) [09/24/2021-18:07:26] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 926, GPU 4507 (MiB) [09/24/2021-18:07:26] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 926 MiB, GPU 4507 MiB [09/24/2021-18:07:26] [I] Created input binding for inputs.1 with dimensions 1x3x640x352 [09/24/2021-18:07:26] [I] Created output binding for 1181 with dimensions 1x9240x4 [09/24/2021-18:07:26] [I] Created output binding for 1339 with dimensions 1x9240x10 [09/24/2021-18:07:26] [I] Created output binding for 1340 with dimensions 1x9240x2 [09/24/2021-18:07:26] [I] Starting inference [09/24/2021-18:07:29] [I] Warmup completed 43 queries over 200 ms [09/24/2021-18:07:29] [I] Timing trace has 656 queries over 3.01036 s [09/24/2021-18:07:29] [I] [09/24/2021-18:07:29] [I] === Trace details === [09/24/2021-18:07:29] [I] Trace averages of 10 runs: [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42458 ms - Host latency: 4.57342 ms (end to end 4.58341 ms, enqueue 2.49678 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42445 ms - Host latency: 4.57319 ms (end to end 4.58299 ms, enqueue 2.44905 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42982 ms - Host latency: 4.57862 ms (end to end 4.58836 ms, enqueue 2.44425 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43179 ms - Host latency: 4.58066 ms (end to end 4.59049 ms, enqueue 2.37694 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.41842 ms - Host latency: 4.56768 ms (end to end 4.57617 ms, enqueue 2.43473 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4274 ms - Host latency: 4.57666 ms (end to end 4.58802 ms, enqueue 2.40548 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43199 ms - Host latency: 4.58074 ms (end to end 4.59091 ms, enqueue 2.37454 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42364 ms - Host latency: 4.57243 ms (end to end 4.58308 ms, enqueue 2.37195 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42254 ms - Host latency: 4.57175 ms (end to end 4.58223 ms, enqueue 2.35853 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42642 ms - Host latency: 4.57553 ms (end to end 4.58604 ms, enqueue 2.38159 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43264 ms - Host latency: 4.58171 ms (end to end 4.59057 ms, enqueue 2.37614 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43371 ms - Host latency: 4.58275 ms (end to end 4.59415 ms, enqueue 2.38123 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42706 ms - Host latency: 4.57629 ms (end to end 4.58818 ms, enqueue 2.3795 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4314 ms - Host latency: 4.58049 ms (end to end 4.5897 ms, enqueue 2.36934 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43033 ms - Host latency: 4.57953 ms (end to end 4.58964 ms, enqueue 2.34334 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42205 ms - Host latency: 4.57159 ms (end to end 4.58318 ms, enqueue 2.35987 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42767 ms - Host latency: 4.57712 ms (end to end 4.58633 ms, enqueue 2.36812 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42989 ms - Host latency: 4.57862 ms (end to end 4.58757 ms, enqueue 2.36015 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42661 ms - Host latency: 4.57537 ms (end to end 4.58437 ms, enqueue 2.34145 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42021 ms - Host latency: 4.56912 ms (end to end 4.57958 ms, enqueue 2.32709 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4288 ms - Host latency: 4.57828 ms (end to end 4.59021 ms, enqueue 2.36727 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43164 ms - Host latency: 4.58091 ms (end to end 4.59008 ms, enqueue 2.33909 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4317 ms - Host latency: 4.58011 ms (end to end 4.59044 ms, enqueue 2.34871 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42926 ms - Host latency: 4.57811 ms (end to end 4.5891 ms, enqueue 2.35013 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42932 ms - Host latency: 4.57816 ms (end to end 4.59026 ms, enqueue 2.33152 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42432 ms - Host latency: 4.57321 ms (end to end 4.58297 ms, enqueue 2.34402 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.41949 ms - Host latency: 4.56859 ms (end to end 4.57942 ms, enqueue 2.36777 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42769 ms - Host latency: 4.57688 ms (end to end 4.58684 ms, enqueue 2.32573 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42804 ms - Host latency: 4.57653 ms (end to end 4.58666 ms, enqueue 2.35432 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42909 ms - Host latency: 4.57881 ms (end to end 4.58712 ms, enqueue 2.34409 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43446 ms - Host latency: 4.58367 ms (end to end 4.5947 ms, enqueue 2.33566 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43483 ms - Host latency: 4.58311 ms (end to end 4.59275 ms, enqueue 2.32732 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42899 ms - Host latency: 4.57767 ms (end to end 4.58727 ms, enqueue 2.36718 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43308 ms - Host latency: 4.58248 ms (end to end 4.59464 ms, enqueue 2.32919 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42068 ms - Host latency: 4.56926 ms (end to end 4.58152 ms, enqueue 2.35371 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42954 ms - Host latency: 4.57822 ms (end to end 4.58896 ms, enqueue 2.32939 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42997 ms - Host latency: 4.57881 ms (end to end 4.59103 ms, enqueue 2.33153 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43575 ms - Host latency: 4.58513 ms (end to end 4.59782 ms, enqueue 2.32601 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42899 ms - Host latency: 4.5778 ms (end to end 4.59023 ms, enqueue 2.33546 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43223 ms - Host latency: 4.58132 ms (end to end 4.5939 ms, enqueue 2.33451 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42465 ms - Host latency: 4.57353 ms (end to end 4.58492 ms, enqueue 2.34061 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42732 ms - Host latency: 4.57659 ms (end to end 4.58796 ms, enqueue 2.3134 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4325 ms - Host latency: 4.58147 ms (end to end 4.59214 ms, enqueue 2.32388 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43293 ms - Host latency: 4.58176 ms (end to end 4.59224 ms, enqueue 2.33313 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43328 ms - Host latency: 4.58223 ms (end to end 4.59346 ms, enqueue 2.32554 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43113 ms - Host latency: 4.58047 ms (end to end 4.59277 ms, enqueue 2.33757 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43455 ms - Host latency: 4.58372 ms (end to end 4.59287 ms, enqueue 2.36929 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42737 ms - Host latency: 4.57642 ms (end to end 4.58772 ms, enqueue 2.33396 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42517 ms - Host latency: 4.57434 ms (end to end 4.58381 ms, enqueue 2.36916 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43267 ms - Host latency: 4.58123 ms (end to end 4.59226 ms, enqueue 2.38721 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42898 ms - Host latency: 4.57871 ms (end to end 4.58706 ms, enqueue 2.40481 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4322 ms - Host latency: 4.58076 ms (end to end 4.5915 ms, enqueue 2.38479 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42971 ms - Host latency: 4.57869 ms (end to end 4.58867 ms, enqueue 2.35176 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42729 ms - Host latency: 4.57593 ms (end to end 4.58616 ms, enqueue 2.3887 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42209 ms - Host latency: 4.57151 ms (end to end 4.58193 ms, enqueue 2.38823 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42397 ms - Host latency: 4.57271 ms (end to end 4.58149 ms, enqueue 2.37244 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.41929 ms - Host latency: 4.56826 ms (end to end 4.57866 ms, enqueue 2.39209 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.4283 ms - Host latency: 4.57686 ms (end to end 4.58684 ms, enqueue 2.38665 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43154 ms - Host latency: 4.58074 ms (end to end 4.59014 ms, enqueue 2.34788 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43311 ms - Host latency: 4.58188 ms (end to end 4.59343 ms, enqueue 2.3259 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43086 ms - Host latency: 4.57993 ms (end to end 4.58945 ms, enqueue 2.34026 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42249 ms - Host latency: 4.57129 ms (end to end 4.58171 ms, enqueue 2.32581 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43198 ms - Host latency: 4.58069 ms (end to end 4.59089 ms, enqueue 2.33267 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.43193 ms - Host latency: 4.58081 ms (end to end 4.59148 ms, enqueue 2.32795 ms) [09/24/2021-18:07:29] [I] Average on 10 runs - GPU latency: 4.42749 ms - Host latency: 4.57637 ms (end to end 4.58777 ms, enqueue 2.35505 ms) [09/24/2021-18:07:29] [I] [09/24/2021-18:07:29] [I] === Performance summary === [09/24/2021-18:07:29] [I] Throughput: 217.914 qps [09/24/2021-18:07:29] [I] Latency: min = 4.54761 ms, max = 4.63867 ms, mean = 4.57764 ms, median = 4.57803 ms, percentile(99%) = 4.59985 ms [09/24/2021-18:07:29] [I] End-to-End Host Latency: min = 4.55884 ms, max = 4.65112 ms, mean = 4.58815 ms, median = 4.58826 ms, percentile(99%) = 4.61398 ms [09/24/2021-18:07:29] [I] Enqueue Time: min = 2.22827 ms, max = 2.61877 ms, mean = 2.35987 ms, median = 2.35272 ms, percentile(99%) = 2.57306 ms [09/24/2021-18:07:29] [I] H2D Latency: min = 0.114258 ms, max = 0.126953 ms, mean = 0.115495 ms, median = 0.115479 ms, percentile(99%) = 0.117432 ms [09/24/2021-18:07:29] [I] GPU Compute Time: min = 4.39941 ms, max = 4.49048 ms, mean = 4.42866 ms, median = 4.42896 ms, percentile(99%) = 4.45032 ms [09/24/2021-18:07:29] [I] D2H Latency: min = 0.0314941 ms, max = 0.0356445 ms, mean = 0.0334863 ms, median = 0.0334473 ms, percentile(99%) = 0.0351562 ms [09/24/2021-18:07:29] [I] Total Host Walltime: 3.01036 s [09/24/2021-18:07:29] [I] Total GPU Compute Time: 2.9052 s [09/24/2021-18:07:29] [I] Explanations of the performance metrics are printed in the verbose logs. [09/24/2021-18:07:29] [I] &&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/acer/nfs-share/epoch_15.onnx --int8 [09/24/2021-18:07:29] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 926, GPU 4509 (MiB)