Request for trtexec ouput explaination

I run with the latest version of tensorRT.
could you guys explain to me the output (especially those summary in the end) of trtexec inference or show me a hyperlink , many thanks.

a log msg example here below.
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=regx800_no_DLA_int8.trt --streams=1
[04/19/2021-10:56:48] [I] === Model Options ===
[04/19/2021-10:56:48] [I] Format: *
[04/19/2021-10:56:48] [I] Model:
[04/19/2021-10:56:48] [I] Output:
[04/19/2021-10:56:48] [I] === Build Options ===
[04/19/2021-10:56:48] [I] Max batch: 1
[04/19/2021-10:56:48] [I] Workspace: 16 MB
[04/19/2021-10:56:48] [I] minTiming: 1
[04/19/2021-10:56:48] [I] avgTiming: 8
[04/19/2021-10:56:48] [I] Precision: FP32
[04/19/2021-10:56:48] [I] Calibration:
[04/19/2021-10:56:48] [I] Safe mode: Disabled
[04/19/2021-10:56:48] [I] Save engine:
[04/19/2021-10:56:48] [I] Load engine: regx800_no_DLA_int8.trt
[04/19/2021-10:56:48] [I] Builder Cache: Enabled
[04/19/2021-10:56:48] [I] NVTX verbosity: 0
[04/19/2021-10:56:48] [I] Inputs format: fp32:CHW
[04/19/2021-10:56:48] [I] Outputs format: fp32:CHW
[04/19/2021-10:56:48] [I] Input build shapes: model
[04/19/2021-10:56:48] [I] Input calibration shapes: model
[04/19/2021-10:56:48] [I] === System Options ===
[04/19/2021-10:56:48] [I] Device: 0
[04/19/2021-10:56:48] [I] DLACore:
[04/19/2021-10:56:48] [I] Plugins:
[04/19/2021-10:56:48] [I] === Inference Options ===
[04/19/2021-10:56:48] [I] Batch: 1
[04/19/2021-10:56:48] [I] Input inference shapes: model
[04/19/2021-10:56:48] [I] Iterations: 10
[04/19/2021-10:56:48] [I] Duration: 3s (+ 200ms warm up)
[04/19/2021-10:56:48] [I] Sleep time: 0ms
[04/19/2021-10:56:48] [I] Streams: 1
[04/19/2021-10:56:48] [I] ExposeDMA: Disabled
[04/19/2021-10:56:48] [I] Spin-wait: Disabled
[04/19/2021-10:56:48] [I] Multithreading: Disabled
[04/19/2021-10:56:48] [I] CUDA Graph: Disabled
[04/19/2021-10:56:48] [I] Skip inference: Disabled
[04/19/2021-10:56:48] [I] Inputs:
[04/19/2021-10:56:48] [I] === Reporting Options ===
[04/19/2021-10:56:48] [I] Verbose: Disabled
[04/19/2021-10:56:48] [I] Averages: 10 inferences
[04/19/2021-10:56:48] [I] Percentile: 99
[04/19/2021-10:56:48] [I] Dump output: Disabled
[04/19/2021-10:56:48] [I] Profile: Disabled
[04/19/2021-10:56:48] [I] Export timing to JSON file:
[04/19/2021-10:56:48] [I] Export output to JSON file:
[04/19/2021-10:56:48] [I] Export profile to JSON file:
[04/19/2021-10:56:48] [I]
[04/19/2021-10:56:51] [I] Starting inference threads
[04/19/2021-10:56:54] [I] Warmup completed 1 queries over 200 ms
[04/19/2021-10:56:54] [I] Timing trace has 107 queries over 2.45116 s
[04/19/2021-10:56:54] [I] Trace averages of 10 runs:
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 69.2957 ms - Host latency: 70.4439 ms (end to end 70.5048 ms, enqueue 6.28821 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 19.471 ms - Host latency: 19.7629 ms (end to end 19.8368 ms, enqueue 4.8366 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.5684 ms - Host latency: 17.8248 ms (end to end 17.8362 ms, enqueue 5.16624 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.5772 ms - Host latency: 17.8338 ms (end to end 17.843 ms, enqueue 5.12288 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.5379 ms - Host latency: 17.7947 ms (end to end 17.8023 ms, enqueue 5.15026 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.5469 ms - Host latency: 17.8036 ms (end to end 17.8123 ms, enqueue 5.10352 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.5139 ms - Host latency: 17.7707 ms (end to end 17.7781 ms, enqueue 5.10715 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.4702 ms - Host latency: 17.7267 ms (end to end 17.7349 ms, enqueue 5.08018 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.4662 ms - Host latency: 17.7226 ms (end to end 17.7299 ms, enqueue 5.04275 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.5454 ms - Host latency: 17.8022 ms (end to end 17.8109 ms, enqueue 5.04663 ms)
[04/19/2021-10:56:54] [I] Host Latency
[04/19/2021-10:56:54] [I] min: 17.6194 ms (end to end 17.6228 ms)
[04/19/2021-10:56:54] [I] max: 75.6416 ms (end to end 75.668 ms)
[04/19/2021-10:56:54] [I] mean: 22.8885 ms (end to end 22.908 ms)
[04/19/2021-10:56:54] [I] median: 17.8048 ms (end to end 17.8149 ms)
[04/19/2021-10:56:54] [I] percentile: 75.1704 ms at 99% (end to end 75.1823 ms at 99%)
[04/19/2021-10:56:54] [I] throughput: 43.6527 qps
[04/19/2021-10:56:54] [I] walltime: 2.45116 s
[04/19/2021-10:56:54] [I] Enqueue Time
[04/19/2021-10:56:54] [I] min: 3.61792 ms
[04/19/2021-10:56:54] [I] max: 7.94598 ms
[04/19/2021-10:56:54] [I] median: 5.52441 ms
[04/19/2021-10:56:54] [I] GPU Compute
[04/19/2021-10:56:54] [I] min: 17.3494 ms
[04/19/2021-10:56:54] [I] max: 74.4847 ms
[04/19/2021-10:56:54] [I] mean: 22.5451 ms
[04/19/2021-10:56:54] [I] median: 17.5398 ms
[04/19/2021-10:56:54] [I] percentile: 73.9891 ms at 99%
[04/19/2021-10:56:54] [I] total compute time: 2.41233 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=regx800_no_DLA_int8.trt --streams=1

@dusty_nv please help. thank you.

Hi,

The output is basically the execution time.

Host latency is measured the end-to-end execution time from CPU point of view.
GPU compute is the real working time for GPU calculation.

The benchmark result is launched multiple time (set by the iteration argument).
So it has min/max/mean and median score.

Thanks.

I see , thanks again.