I run with the latest version of tensorRT.
could you guys explain to me the output (especially those summary in the end) of trtexec inference or show me a hyperlink , many thanks.
a log msg example here below.
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=regx800_no_DLA_int8.trt --streams=1
[04/19/2021-10:56:48] [I] === Model Options ===
[04/19/2021-10:56:48] [I] Format: *
[04/19/2021-10:56:48] [I] Model:
[04/19/2021-10:56:48] [I] Output:
[04/19/2021-10:56:48] [I] === Build Options ===
[04/19/2021-10:56:48] [I] Max batch: 1
[04/19/2021-10:56:48] [I] Workspace: 16 MB
[04/19/2021-10:56:48] [I] minTiming: 1
[04/19/2021-10:56:48] [I] avgTiming: 8
[04/19/2021-10:56:48] [I] Precision: FP32
[04/19/2021-10:56:48] [I] Calibration:
[04/19/2021-10:56:48] [I] Safe mode: Disabled
[04/19/2021-10:56:48] [I] Save engine:
[04/19/2021-10:56:48] [I] Load engine: regx800_no_DLA_int8.trt
[04/19/2021-10:56:48] [I] Builder Cache: Enabled
[04/19/2021-10:56:48] [I] NVTX verbosity: 0
[04/19/2021-10:56:48] [I] Inputs format: fp32:CHW
[04/19/2021-10:56:48] [I] Outputs format: fp32:CHW
[04/19/2021-10:56:48] [I] Input build shapes: model
[04/19/2021-10:56:48] [I] Input calibration shapes: model
[04/19/2021-10:56:48] [I] === System Options ===
[04/19/2021-10:56:48] [I] Device: 0
[04/19/2021-10:56:48] [I] DLACore:
[04/19/2021-10:56:48] [I] Plugins:
[04/19/2021-10:56:48] [I] === Inference Options ===
[04/19/2021-10:56:48] [I] Batch: 1
[04/19/2021-10:56:48] [I] Input inference shapes: model
[04/19/2021-10:56:48] [I] Iterations: 10
[04/19/2021-10:56:48] [I] Duration: 3s (+ 200ms warm up)
[04/19/2021-10:56:48] [I] Sleep time: 0ms
[04/19/2021-10:56:48] [I] Streams: 1
[04/19/2021-10:56:48] [I] ExposeDMA: Disabled
[04/19/2021-10:56:48] [I] Spin-wait: Disabled
[04/19/2021-10:56:48] [I] Multithreading: Disabled
[04/19/2021-10:56:48] [I] CUDA Graph: Disabled
[04/19/2021-10:56:48] [I] Skip inference: Disabled
[04/19/2021-10:56:48] [I] Inputs:
[04/19/2021-10:56:48] [I] === Reporting Options ===
[04/19/2021-10:56:48] [I] Verbose: Disabled
[04/19/2021-10:56:48] [I] Averages: 10 inferences
[04/19/2021-10:56:48] [I] Percentile: 99
[04/19/2021-10:56:48] [I] Dump output: Disabled
[04/19/2021-10:56:48] [I] Profile: Disabled
[04/19/2021-10:56:48] [I] Export timing to JSON file:
[04/19/2021-10:56:48] [I] Export output to JSON file:
[04/19/2021-10:56:48] [I] Export profile to JSON file:
[04/19/2021-10:56:48] [I]
[04/19/2021-10:56:51] [I] Starting inference threads
[04/19/2021-10:56:54] [I] Warmup completed 1 queries over 200 ms
[04/19/2021-10:56:54] [I] Timing trace has 107 queries over 2.45116 s
[04/19/2021-10:56:54] [I] Trace averages of 10 runs:
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 69.2957 ms - Host latency: 70.4439 ms (end to end 70.5048 ms, enqueue 6.28821 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 19.471 ms - Host latency: 19.7629 ms (end to end 19.8368 ms, enqueue 4.8366 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.5684 ms - Host latency: 17.8248 ms (end to end 17.8362 ms, enqueue 5.16624 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.5772 ms - Host latency: 17.8338 ms (end to end 17.843 ms, enqueue 5.12288 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.5379 ms - Host latency: 17.7947 ms (end to end 17.8023 ms, enqueue 5.15026 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.5469 ms - Host latency: 17.8036 ms (end to end 17.8123 ms, enqueue 5.10352 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.5139 ms - Host latency: 17.7707 ms (end to end 17.7781 ms, enqueue 5.10715 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.4702 ms - Host latency: 17.7267 ms (end to end 17.7349 ms, enqueue 5.08018 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.4662 ms - Host latency: 17.7226 ms (end to end 17.7299 ms, enqueue 5.04275 ms)
[04/19/2021-10:56:54] [I] Average on 10 runs - GPU latency: 17.5454 ms - Host latency: 17.8022 ms (end to end 17.8109 ms, enqueue 5.04663 ms)
[04/19/2021-10:56:54] [I] Host Latency
[04/19/2021-10:56:54] [I] min: 17.6194 ms (end to end 17.6228 ms)
[04/19/2021-10:56:54] [I] max: 75.6416 ms (end to end 75.668 ms)
[04/19/2021-10:56:54] [I] mean: 22.8885 ms (end to end 22.908 ms)
[04/19/2021-10:56:54] [I] median: 17.8048 ms (end to end 17.8149 ms)
[04/19/2021-10:56:54] [I] percentile: 75.1704 ms at 99% (end to end 75.1823 ms at 99%)
[04/19/2021-10:56:54] [I] throughput: 43.6527 qps
[04/19/2021-10:56:54] [I] walltime: 2.45116 s
[04/19/2021-10:56:54] [I] Enqueue Time
[04/19/2021-10:56:54] [I] min: 3.61792 ms
[04/19/2021-10:56:54] [I] max: 7.94598 ms
[04/19/2021-10:56:54] [I] median: 5.52441 ms
[04/19/2021-10:56:54] [I] GPU Compute
[04/19/2021-10:56:54] [I] min: 17.3494 ms
[04/19/2021-10:56:54] [I] max: 74.4847 ms
[04/19/2021-10:56:54] [I] mean: 22.5451 ms
[04/19/2021-10:56:54] [I] median: 17.5398 ms
[04/19/2021-10:56:54] [I] percentile: 73.9891 ms at 99%
[04/19/2021-10:56:54] [I] total compute time: 2.41233 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=regx800_no_DLA_int8.trt --streams=1
@dusty_nv please help. thank you.
The output is basically the execution time.
Host latency is measured the end-to-end execution time from CPU point of view.
GPU compute is the real working time for GPU calculation.
The benchmark result is launched multiple time (set by the iteration argument).
So it has min/max/mean and median score.
Hi, just adding to this discussion, what does total compute time
pertain to?
These are the logs I am getting.
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=/home/user/gazebo_dev/computer-vision-varun/models/mobilenet-v2-ssdlite/mobilenet-v2-ssdlite-trtexec-fp32.trt --batch=1
[07/30/2021-18:19:14] [I] === Model Options ===
[07/30/2021-18:19:14] [I] Format: *
[07/30/2021-18:19:14] [I] Model:
[07/30/2021-18:19:14] [I] Output:
[07/30/2021-18:19:14] [I] === Build Options ===
[07/30/2021-18:19:14] [I] Max batch: 1
[07/30/2021-18:19:14] [I] Workspace: 16 MB
[07/30/2021-18:19:14] [I] minTiming: 1
[07/30/2021-18:19:14] [I] avgTiming: 8
[07/30/2021-18:19:14] [I] Precision: FP32
[07/30/2021-18:19:14] [I] Calibration:
[07/30/2021-18:19:14] [I] Safe mode: Disabled
[07/30/2021-18:19:14] [I] Save engine:
[07/30/2021-18:19:14] [I] Load engine: /home/user/gazebo_dev/computer-vision-varun/models/mobilenet-v2-ssdlite/mobilenet-v2-ssdlite-trtexec-fp32.trt
[07/30/2021-18:19:14] [I] Builder Cache: Enabled
[07/30/2021-18:19:14] [I] NVTX verbosity: 0
[07/30/2021-18:19:14] [I] Inputs format: fp32:CHW
[07/30/2021-18:19:14] [I] Outputs format: fp32:CHW
[07/30/2021-18:19:14] [I] Input build shapes: model
[07/30/2021-18:19:14] [I] Input calibration shapes: model
[07/30/2021-18:19:14] [I] === System Options ===
[07/30/2021-18:19:14] [I] Device: 0
[07/30/2021-18:19:14] [I] DLACore:
[07/30/2021-18:19:14] [I] Plugins:
[07/30/2021-18:19:14] [I] === Inference Options ===
[07/30/2021-18:19:14] [I] Batch: 1
[07/30/2021-18:19:14] [I] Input inference shapes: model
[07/30/2021-18:19:14] [I] Iterations: 10
[07/30/2021-18:19:14] [I] Duration: 3s (+ 200ms warm up)
[07/30/2021-18:19:14] [I] Sleep time: 0ms
[07/30/2021-18:19:14] [I] Streams: 1
[07/30/2021-18:19:14] [I] ExposeDMA: Disabled
[07/30/2021-18:19:14] [I] Spin-wait: Disabled
[07/30/2021-18:19:14] [I] Multithreading: Disabled
[07/30/2021-18:19:14] [I] CUDA Graph: Disabled
[07/30/2021-18:19:14] [I] Skip inference: Disabled
[07/30/2021-18:19:14] [I] Inputs:
[07/30/2021-18:19:14] [I] === Reporting Options ===
[07/30/2021-18:19:14] [I] Verbose: Disabled
[07/30/2021-18:19:14] [I] Averages: 10 inferences
[07/30/2021-18:19:14] [I] Percentile: 99
[07/30/2021-18:19:14] [I] Dump output: Disabled
[07/30/2021-18:19:14] [I] Profile: Disabled
[07/30/2021-18:19:14] [I] Export timing to JSON file:
[07/30/2021-18:19:14] [I] Export output to JSON file:
[07/30/2021-18:19:14] [I] Export profile to JSON file:
[07/30/2021-18:19:14] [I]
[07/30/2021-18:19:17] [I] Starting inference threads
[07/30/2021-18:19:20] [I] Warmup completed 1 queries over 200 ms
[07/30/2021-18:19:20] [I] Timing trace has 610 queries over 2.69807 s
[07/30/2021-18:19:20] [I] Trace averages of 10 runs:
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.2872 ms - Host latency: 4.33364 ms (end to end 4.34423 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.30612 ms - Host latency: 4.35173 ms (end to end 4.36381 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.32838 ms - Host latency: 4.37363 ms (end to end 4.38572 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.32194 ms - Host latency: 4.36702 ms (end to end 4.37739 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.34443 ms - Host latency: 4.39061 ms (end to end 4.40264 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.33156 ms - Host latency: 4.37774 ms (end to end 4.38859 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.32841 ms - Host latency: 4.37469 ms (end to end 4.38506 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.3217 ms - Host latency: 4.36805 ms (end to end 4.38025 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.32236 ms - Host latency: 4.36834 ms (end to end 4.37947 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.3033 ms - Host latency: 4.34854 ms (end to end 4.35918 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.30023 ms - Host latency: 4.34626 ms (end to end 4.35703 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.30563 ms - Host latency: 4.35121 ms (end to end 4.36198 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.31742 ms - Host latency: 4.36305 ms (end to end 4.37406 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.34239 ms - Host latency: 4.38904 ms (end to end 4.40028 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.34396 ms - Host latency: 4.39023 ms (end to end 4.4017 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.35278 ms - Host latency: 4.39979 ms (end to end 4.41001 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.35287 ms - Host latency: 4.39883 ms (end to end 4.4092 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.34816 ms - Host latency: 4.39468 ms (end to end 4.40531 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.35314 ms - Host latency: 4.39963 ms (end to end 4.40947 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.3744 ms - Host latency: 4.42037 ms (end to end 4.42996 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.37999 ms - Host latency: 4.42643 ms (end to end 4.43682 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.37872 ms - Host latency: 4.42509 ms (end to end 4.43496 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.38658 ms - Host latency: 4.43326 ms (end to end 4.44401 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.38939 ms - Host latency: 4.43604 ms (end to end 4.44724 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.39353 ms - Host latency: 4.44033 ms (end to end 4.44982 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.41874 ms - Host latency: 4.46549 ms (end to end 4.47679 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.39491 ms - Host latency: 4.44143 ms (end to end 4.45187 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.40007 ms - Host latency: 4.44659 ms (end to end 4.45743 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.39281 ms - Host latency: 4.43981 ms (end to end 4.45066 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.4027 ms - Host latency: 4.44961 ms (end to end 4.45989 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.40023 ms - Host latency: 4.44717 ms (end to end 4.45852 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.41305 ms - Host latency: 4.46034 ms (end to end 4.47102 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.39603 ms - Host latency: 4.44313 ms (end to end 4.45382 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.40184 ms - Host latency: 4.44941 ms (end to end 4.46113 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.38324 ms - Host latency: 4.42989 ms (end to end 4.44091 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.36799 ms - Host latency: 4.41421 ms (end to end 4.42534 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.35134 ms - Host latency: 4.39795 ms (end to end 4.40845 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.3498 ms - Host latency: 4.39636 ms (end to end 4.40786 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.37107 ms - Host latency: 4.41675 ms (end to end 4.42815 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.34622 ms - Host latency: 4.39238 ms (end to end 4.40217 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.36155 ms - Host latency: 4.40767 ms (end to end 4.42021 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.34531 ms - Host latency: 4.39116 ms (end to end 4.40122 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.34207 ms - Host latency: 4.38848 ms (end to end 4.39888 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.34807 ms - Host latency: 4.39446 ms (end to end 4.4042 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.36248 ms - Host latency: 4.4085 ms (end to end 4.41907 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.3616 ms - Host latency: 4.40801 ms (end to end 4.41912 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.37869 ms - Host latency: 4.42512 ms (end to end 4.43738 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.38811 ms - Host latency: 4.43477 ms (end to end 4.44683 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.39858 ms - Host latency: 4.44458 ms (end to end 4.45583 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.39333 ms - Host latency: 4.43967 ms (end to end 4.44951 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.39023 ms - Host latency: 4.43577 ms (end to end 4.44636 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.38879 ms - Host latency: 4.43564 ms (end to end 4.44678 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.37009 ms - Host latency: 4.41755 ms (end to end 4.42747 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.40022 ms - Host latency: 4.44827 ms (end to end 4.45884 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.40286 ms - Host latency: 4.44871 ms (end to end 4.46074 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.42466 ms - Host latency: 4.47151 ms (end to end 4.48198 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.4009 ms - Host latency: 4.44778 ms (end to end 4.45979 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.39854 ms - Host latency: 4.4448 ms (end to end 4.4553 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.38442 ms - Host latency: 4.43079 ms (end to end 4.44194 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.38901 ms - Host latency: 4.43535 ms (end to end 4.44756 ms)
[07/30/2021-18:19:20] [I] Average on 10 runs - GPU latency: 4.37397 ms - Host latency: 4.42034 ms (end to end 4.43242 ms)
[07/30/2021-18:19:20] [I] Host latency
[07/30/2021-18:19:20] [I] min: 4.30518 ms (end to end 4.31677 ms)
[07/30/2021-18:19:20] [I] max: 4.51245 ms (end to end 4.5271 ms)
[07/30/2021-18:19:20] [I] mean: 4.41209 ms (end to end 4.42301 ms)
[07/30/2021-18:19:20] [I] median: 4.41614 ms (end to end 4.42578 ms)
[07/30/2021-18:19:20] [I] percentile: 4.48828 ms at 99% (end to end 4.49646 ms at 99%)
[07/30/2021-18:19:20] [I] throughput: 226.088 qps
[07/30/2021-18:19:20] [I] walltime: 2.69807 s
[07/30/2021-18:19:20] [I] GPU Compute
[07/30/2021-18:19:20] [I] min: 4.2616 ms
[07/30/2021-18:19:20] [I] max: 4.46606 ms
[07/30/2021-18:19:20] [I] mean: 4.36571 ms
[07/30/2021-18:19:20] [I] median: 4.36963 ms
[07/30/2021-18:19:20] [I] percentile: 4.44055 ms at 99%
[07/30/2021-18:19:20] [I] total compute time: 2.66308 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=/home/user/gazebo_dev/computer-vision-varun/models/mobilenet-v2-ssdlite/mobilenet-v2-ssdlite-trtexec-fp32.trt --batch=1
I noticed that in the logs posted above by @JeremyYuan, the total compute time (or the walltime as a matter of fact) is actually shorter despite the mean of the GPU compute is much larger than mine. How should I read into this?
NVM. I figured it out. I just realized my model was making way more number of queries over a similar time period.