Sharing the output of trtexec
:
&&&& RUNNING TensorRT.trtexec [TensorRT v8205] # /usr/src/tensorrt/bin/trtexec --onnx=a324eda4-2ed5-4227-b2ea-274ab2ebaf8b.onnx --saveEngine=a324eda4-2ed5-4227-b2ea-274ab2ebaf8b.onnx_b1_gpu0_fp32.engine --explicitBatch
[08/01/2022-23:09:52] [W] --explicitBatch flag has been deprecated and has no effect!
[08/01/2022-23:09:52] [W] Explicit batch dim is automatically enabled if input model is ONNX or if dynamic shapes are provided when the engine is built.
[08/01/2022-23:09:52] [I] === Model Options ===
[08/01/2022-23:09:52] [I] Format: ONNX
[08/01/2022-23:09:52] [I] Model: a324eda4-2ed5-4227-b2ea-274ab2ebaf8b.onnx
[08/01/2022-23:09:52] [I] Output:
[08/01/2022-23:09:52] [I] === Build Options ===
[08/01/2022-23:09:52] [I] Max batch: explicit batch
[08/01/2022-23:09:52] [I] Workspace: 16 MiB
[08/01/2022-23:09:52] [I] minTiming: 1
[08/01/2022-23:09:52] [I] avgTiming: 8
[08/01/2022-23:09:52] [I] Precision: FP32
[08/01/2022-23:09:52] [I] Calibration:
[08/01/2022-23:09:52] [I] Refit: Disabled
[08/01/2022-23:09:52] [I] Sparsity: Disabled
[08/01/2022-23:09:52] [I] Safe mode: Disabled
[08/01/2022-23:09:52] [I] DirectIO mode: Disabled
[08/01/2022-23:09:52] [I] Restricted mode: Disabled
[08/01/2022-23:09:52] [I] Save engine: a324eda4-2ed5-4227-b2ea-274ab2ebaf8b.onnx_b1_gpu0_fp32.engine
[08/01/2022-23:09:52] [I] Load engine:
[08/01/2022-23:09:52] [I] Profiling verbosity: 0
[08/01/2022-23:09:52] [I] Tactic sources: Using default tactic sources
[08/01/2022-23:09:52] [I] timingCacheMode: local
[08/01/2022-23:09:52] [I] timingCacheFile:
[08/01/2022-23:09:52] [I] Input(s)s format: fp32:CHW
[08/01/2022-23:09:52] [I] Output(s)s format: fp32:CHW
[08/01/2022-23:09:52] [I] Input build shapes: model
[08/01/2022-23:09:52] [I] Input calibration shapes: model
[08/01/2022-23:09:52] [I] === System Options ===
[08/01/2022-23:09:52] [I] Device: 0
[08/01/2022-23:09:52] [I] DLACore:
[08/01/2022-23:09:52] [I] Plugins:
[08/01/2022-23:09:52] [I] === Inference Options ===
[08/01/2022-23:09:52] [I] Batch: Explicit
[08/01/2022-23:09:52] [I] Input inference shapes: model
[08/01/2022-23:09:52] [I] Iterations: 10
[08/01/2022-23:09:52] [I] Duration: 3s (+ 200ms warm up)
[08/01/2022-23:09:52] [I] Sleep time: 0ms
[08/01/2022-23:09:52] [I] Idle time: 0ms
[08/01/2022-23:09:52] [I] Streams: 1
[08/01/2022-23:09:52] [I] ExposeDMA: Disabled
[08/01/2022-23:09:52] [I] Data transfers: Enabled
[08/01/2022-23:09:52] [I] Spin-wait: Disabled
[08/01/2022-23:09:52] [I] Multithreading: Disabled
[08/01/2022-23:09:52] [I] CUDA Graph: Disabled
[08/01/2022-23:09:52] [I] Separate profiling: Disabled
[08/01/2022-23:09:52] [I] Time Deserialize: Disabled
[08/01/2022-23:09:52] [I] Time Refit: Disabled
[08/01/2022-23:09:52] [I] Skip inference: Disabled
[08/01/2022-23:09:52] [I] Inputs:
[08/01/2022-23:09:52] [I] === Reporting Options ===
[08/01/2022-23:09:52] [I] Verbose: Disabled
[08/01/2022-23:09:52] [I] Averages: 10 inferences
[08/01/2022-23:09:52] [I] Percentile: 99
[08/01/2022-23:09:52] [I] Dump refittable layers:Disabled
[08/01/2022-23:09:52] [I] Dump output: Disabled
[08/01/2022-23:09:52] [I] Profile: Disabled
[08/01/2022-23:09:52] [I] Export timing to JSON file:
[08/01/2022-23:09:52] [I] Export output to JSON file:
[08/01/2022-23:09:52] [I] Export profile to JSON file:
[08/01/2022-23:09:52] [I]
[08/01/2022-23:09:52] [I] === Device Information ===
[08/01/2022-23:09:52] [I] Selected Device: NVIDIA GeForce GTX 1060 6GB
[08/01/2022-23:09:52] [I] Compute Capability: 6.1
[08/01/2022-23:09:52] [I] SMs: 10
[08/01/2022-23:09:52] [I] Compute Clock Rate: 1.7335 GHz
[08/01/2022-23:09:52] [I] Device Global Memory: 6070 MiB
[08/01/2022-23:09:52] [I] Shared Memory per SM: 96 KiB
[08/01/2022-23:09:52] [I] Memory Bus Width: 192 bits (ECC disabled)
[08/01/2022-23:09:52] [I] Memory Clock Rate: 4.004 GHz
[08/01/2022-23:09:52] [I]
[08/01/2022-23:09:52] [I] TensorRT version: 8.2.5
[08/01/2022-23:09:53] [I] [TRT] [MemUsageChange] Init CUDA: CPU +193, GPU +0, now: CPU 205, GPU 1542 (MiB)
[08/01/2022-23:09:53] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 205 MiB, GPU 1541 MiB
[08/01/2022-23:09:53] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 268 MiB, GPU 1541 MiB
[08/01/2022-23:09:53] [I] Start parsing network model
[08/01/2022-23:09:53] [I] [TRT] ----------------------------------------------------------------
[08/01/2022-23:09:53] [I] [TRT] Input filename: a324eda4-2ed5-4227-b2ea-274ab2ebaf8b.onnx
[08/01/2022-23:09:53] [I] [TRT] ONNX IR version: 0.0.6
[08/01/2022-23:09:53] [I] [TRT] Opset version: 11
[08/01/2022-23:09:53] [I] [TRT] Producer name:
[08/01/2022-23:09:53] [I] [TRT] Producer version:
[08/01/2022-23:09:53] [I] [TRT] Domain:
[08/01/2022-23:09:53] [I] [TRT] Model version: 0
[08/01/2022-23:09:53] [I] [TRT] Doc string:
[08/01/2022-23:09:53] [I] [TRT] ----------------------------------------------------------------
[08/01/2022-23:09:53] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_02/1_dn_lvl_6/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,8,8,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_03/1_dn_lvl_5/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,16,16,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_04/1_dn_lvl_4/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,32,32,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_05/1_dn_lvl_3/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,64,64,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_06/1_up_lvl_4/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,32,32,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_07/1_up_lvl_5/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,16,16,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_08/1_up_lvl_6/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,8,8,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_09/1_up_lvl_7/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,4,4,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_10/2_dn_lvl_6/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,8,8,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_11/2_dn_lvl_5/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,16,16,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_12/2_dn_lvl_4/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,32,32,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_13/2_dn_lvl_3/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,64,64,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_14/2_up_lvl_4/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,32,32,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_15/2_up_lvl_5/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,16,16,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_16/2_up_lvl_6/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,8,8,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_17/2_up_lvl_7/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,4,4,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_18/3_dn_lvl_6/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,8,8,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_19/3_dn_lvl_5/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,16,16,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_20/3_dn_lvl_4/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,32,32,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_21/3_dn_lvl_3/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,64,64,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_22/3_up_lvl_4/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,32,32,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_23/3_up_lvl_5/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,16,16,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_24/3_up_lvl_6/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,8,8,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_25/3_up_lvl_7/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,4,4,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] No importer registered for op: BatchedNMS_TRT. Attempting to import as plugin.
[08/01/2022-23:09:53] [I] [TRT] Searching for plugin: BatchedNMS_TRT, plugin_version: 1, plugin_namespace:
[08/01/2022-23:09:53] [W] [TRT] builtin_op_importers.cpp:4780: Attribute scoreBits not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
[08/01/2022-23:09:53] [I] [TRT] Successfully created plugin: BatchedNMS_TRT
[08/01/2022-23:09:53] [I] Finish parsing network model
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_02/1_dn_lvl_6/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,8,8,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_03/1_dn_lvl_5/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,16,16,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_04/1_dn_lvl_4/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,32,32,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_05/1_dn_lvl_3/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,64,64,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_06/1_up_lvl_4/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,32,32,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_07/1_up_lvl_5/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,16,16,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_08/1_up_lvl_6/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,8,8,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_09/1_up_lvl_7/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,4,4,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_10/2_dn_lvl_6/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,8,8,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_11/2_dn_lvl_5/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,16,16,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_12/2_dn_lvl_4/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,32,32,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_13/2_dn_lvl_3/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,64,64,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_14/2_up_lvl_4/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,32,32,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_15/2_up_lvl_5/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,16,16,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_16/2_up_lvl_6/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,8,8,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_17/2_up_lvl_7/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,4,4,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_18/3_dn_lvl_6/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,8,8,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_19/3_dn_lvl_5/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,16,16,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_20/3_dn_lvl_4/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,32,32,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_21/3_dn_lvl_3/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,64,64,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_22/3_up_lvl_4/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,32,32,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_23/3_up_lvl_5/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,16,16,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_24/3_up_lvl_6/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,8,8,64,3][NONE] dims(input1)=[1,1,1,3,1][NONE].
[08/01/2022-23:09:53] [I] [TRT] StatefulPartitionedCall/EfficientDet-D0/bifpn/node_25/3_up_lvl_7/combine/MatMul: broadcasting input1 to make tensors conform, dims(input0)=[1,4,4,64,2][NONE] dims(input1)=[1,1,1,2,1][NONE].
[08/01/2022-23:09:54] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +270, GPU +112, now: CPU 563, GPU 1819 (MiB)
[08/01/2022-23:09:55] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +112, GPU +46, now: CPU 675, GPU 1865 (MiB)
[08/01/2022-23:09:55] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/01/2022-23:10:37] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[08/01/2022-23:12:24] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[08/01/2022-23:12:24] [I] [TRT] Total Host Persistent Memory: 316912
[08/01/2022-23:12:24] [I] [TRT] Total Device Persistent Memory: 13812736
[08/01/2022-23:12:24] [I] [TRT] Total Scratch Memory: 4346624
[08/01/2022-23:12:24] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 12 MiB, GPU 52 MiB
[08/01/2022-23:12:25] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 695.245ms to assign 16 blocks to 616 nodes requiring 54300675 bytes.
[08/01/2022-23:12:25] [I] [TRT] Total Activation Memory: 54300675
[08/01/2022-23:12:25] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1010, GPU 1303 (MiB)
[08/01/2022-23:12:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 1011, GPU 1313 (MiB)
[08/01/2022-23:12:25] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +3, GPU +18, now: CPU 3, GPU 18 (MiB)
[08/01/2022-23:12:25] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1005, GPU 1260 (MiB)
[08/01/2022-23:12:25] [I] [TRT] Loaded engine size: 21 MiB
[08/01/2022-23:12:25] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1027, GPU 1288 (MiB)
[08/01/2022-23:12:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1027, GPU 1296 (MiB)
[08/01/2022-23:12:25] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +17, now: CPU 0, GPU 17 (MiB)
[08/01/2022-23:12:25] [I] Engine built in 152.478 sec.
[08/01/2022-23:12:25] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 923, GPU 1289 (MiB)
[08/01/2022-23:12:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 923, GPU 1297 (MiB)
[08/01/2022-23:12:25] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +65, now: CPU 0, GPU 82 (MiB)
[08/01/2022-23:12:25] [I] Using random values for input input_tensor:0
[08/01/2022-23:12:25] [I] Created input binding for input_tensor:0 with dimensions 1x3x512x512
[08/01/2022-23:12:25] [I] Using random values for output num_detections
[08/01/2022-23:12:25] [I] Created output binding for num_detections with dimensions 1
[08/01/2022-23:12:25] [I] Using random values for output detection_boxes
[08/01/2022-23:12:25] [I] Created output binding for detection_boxes with dimensions 1x100x4
[08/01/2022-23:12:25] [I] Using random values for output detection_scores
[08/01/2022-23:12:25] [I] Created output binding for detection_scores with dimensions 1x100
[08/01/2022-23:12:25] [I] Using random values for output detection_classes
[08/01/2022-23:12:25] [I] Created output binding for detection_classes with dimensions 1x100
[08/01/2022-23:12:25] [I] Starting inference
[08/01/2022-23:12:28] [I] Warmup completed 14 queries over 200 ms
[08/01/2022-23:12:28] [I] Timing trace has 226 queries over 3.02941 s
[08/01/2022-23:12:28] [I]
[08/01/2022-23:12:28] [I] === Trace details ===
[08/01/2022-23:12:28] [I] Trace averages of 10 runs:
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.4645 ms - Host latency: 13.7185 ms (end to end 26.9152 ms, enqueue 8.85314 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.7576 ms - Host latency: 14.0139 ms (end to end 27.4643 ms, enqueue 8.80132 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.0679 ms - Host latency: 13.3217 ms (end to end 26.0629 ms, enqueue 8.58351 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.0766 ms - Host latency: 13.33 ms (end to end 26.0676 ms, enqueue 8.58005 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.0637 ms - Host latency: 13.3181 ms (end to end 25.8085 ms, enqueue 8.33931 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.3895 ms - Host latency: 13.6433 ms (end to end 26.6571 ms, enqueue 8.6418 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.6559 ms - Host latency: 13.9138 ms (end to end 27.0883 ms, enqueue 8.69785 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.2522 ms - Host latency: 13.5069 ms (end to end 26.4325 ms, enqueue 8.61039 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.0745 ms - Host latency: 13.3288 ms (end to end 26.0587 ms, enqueue 8.57388 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.0666 ms - Host latency: 13.3216 ms (end to end 25.9076 ms, enqueue 8.43093 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.2116 ms - Host latency: 13.4649 ms (end to end 26.3231 ms, enqueue 8.58948 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.4458 ms - Host latency: 13.7011 ms (end to end 26.748 ms, enqueue 8.81583 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.4537 ms - Host latency: 13.7085 ms (end to end 26.8575 ms, enqueue 8.77134 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.233 ms - Host latency: 13.4878 ms (end to end 26.3739 ms, enqueue 8.68862 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.6176 ms - Host latency: 13.8739 ms (end to end 27.1198 ms, enqueue 8.88249 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.0697 ms - Host latency: 13.3263 ms (end to end 25.9787 ms, enqueue 8.49829 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.7578 ms - Host latency: 14.0157 ms (end to end 27.3895 ms, enqueue 8.58496 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.2208 ms - Host latency: 13.4765 ms (end to end 26.2712 ms, enqueue 8.59319 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.2895 ms - Host latency: 13.5448 ms (end to end 26.633 ms, enqueue 8.89253 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.3224 ms - Host latency: 13.5785 ms (end to end 26.4882 ms, enqueue 8.63699 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.7057 ms - Host latency: 13.9638 ms (end to end 27.4405 ms, enqueue 8.9313 ms)
[08/01/2022-23:12:28] [I] Average on 10 runs - GPU latency: 13.4544 ms - Host latency: 13.7103 ms (end to end 26.8259 ms, enqueue 8.81165 ms)
[08/01/2022-23:12:28] [I]
[08/01/2022-23:12:28] [I] === Performance summary ===
[08/01/2022-23:12:28] [I] Throughput: 74.6021 qps
[08/01/2022-23:12:28] [I] Latency: min = 13.2803 ms, max = 17.0567 ms, mean = 13.5957 ms, median = 13.3355 ms, percentile(99%) = 15.1699 ms
[08/01/2022-23:12:28] [I] End-to-End Host Latency: min = 25.2949 ms, max = 30.1061 ms, mean = 26.5738 ms, median = 26.1086 ms, percentile(99%) = 29.1558 ms
[08/01/2022-23:12:28] [I] Enqueue Time: min = 7.83862 ms, max = 10.088 ms, mean = 8.671 ms, median = 8.59497 ms, percentile(99%) = 9.83936 ms
[08/01/2022-23:12:28] [I] H2D Latency: min = 0.244385 ms, max = 0.263428 ms, mean = 0.248713 ms, median = 0.247559 ms, percentile(99%) = 0.259155 ms
[08/01/2022-23:12:28] [I] GPU Compute Time: min = 13.0251 ms, max = 16.8042 ms, mean = 13.3404 ms, median = 13.0816 ms, percentile(99%) = 14.9084 ms
[08/01/2022-23:12:28] [I] D2H Latency: min = 0.00488281 ms, max = 0.00952148 ms, mean = 0.00658322 ms, median = 0.00622559 ms, percentile(99%) = 0.00927734 ms
[08/01/2022-23:12:28] [I] Total Host Walltime: 3.02941 s
[08/01/2022-23:12:28] [I] Total GPU Compute Time: 3.01492 s
[08/01/2022-23:12:28] [I] Explanations of the performance metrics are printed in the verbose logs.
[08/01/2022-23:12:28] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8205] # /usr/src/tensorrt/bin/trtexec --onnx=a324eda4-2ed5-4227-b2ea-274ab2ebaf8b.onnx --saveEngine=a324eda4-2ed5-4227-b2ea-274ab2ebaf8b.onnx_b1_gpu0_fp32.engine --explicitBatch