No performance difference between Float16 and Float32 optimized TensorRT models

I am currently using the Python API for TensorRT (ver. 7.1.0) to convert from ONNX (ver. 1.9) to Tensor RT. I have two models, one with weights, parameters and inputs in Float16, and another one with Float32.

The model I was optimizing from was originally based on the Pytorch implementation of SSD-Mobilenet-v1 and SSD-Mobilenet-v2. When running inference on both models on a set of images, I see that the GPU utilization is nearly the same (which around 30-40%) and the frame rate is only about 70 fps.

Would really like some help in understanding why there was no performance bump. Thanks!

Hi,

Could you test it with trtexec and share the log with us first?

$  /usr/src/tensorrt/bin/trtexec --onnx=[onnx/model]               # fp32
$  /usr/src/tensorrt/bin/trtexec --onnx=[onnx/model] --fp16        # fp16

Thanks.

Posting the logs below.

My Float 32 TensorRT model

user@localhost:~$ /usr/src/tensorrt/bin/trtexec --loadEngine=/home/user/gazebo_dev/computer-vision-varun/models/mobilenet-v1-ssd/mobilenet-v1-ssd-trtexec-fp32.trt --batch=1
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=/home/user/gazebo_dev/computer-vision-varun/models/mobilenet-v1-ssd/mobilenet-v1-ssd-trtexec-fp32.trt --batch=1
[07/30/2021-08:49:24] [I] === Model Options ===
[07/30/2021-08:49:24] [I] Format: *
[07/30/2021-08:49:24] [I] Model:
[07/30/2021-08:49:24] [I] Output:
[07/30/2021-08:49:24] [I] === Build Options ===
[07/30/2021-08:49:24] [I] Max batch: 1
[07/30/2021-08:49:24] [I] Workspace: 16 MB
[07/30/2021-08:49:24] [I] minTiming: 1
[07/30/2021-08:49:24] [I] avgTiming: 8
[07/30/2021-08:49:24] [I] Precision: FP32
[07/30/2021-08:49:24] [I] Calibration:
[07/30/2021-08:49:24] [I] Safe mode: Disabled
[07/30/2021-08:49:24] [I] Save engine:
[07/30/2021-08:49:24] [I] Load engine: /home/user/gazebo_dev/computer-vision-varun/models/mobilenet-v1-ssd/mobilenet-v1-ssd-trtexec-fp32.trt
[07/30/2021-08:49:24] [I] Builder Cache: Enabled
[07/30/2021-08:49:24] [I] NVTX verbosity: 0
[07/30/2021-08:49:24] [I] Inputs format: fp32:CHW
[07/30/2021-08:49:24] [I] Outputs format: fp32:CHW
[07/30/2021-08:49:24] [I] Input build shapes: model
[07/30/2021-08:49:24] [I] Input calibration shapes: model
[07/30/2021-08:49:24] [I] === System Options ===
[07/30/2021-08:49:24] [I] Device: 0
[07/30/2021-08:49:24] [I] DLACore:
[07/30/2021-08:49:24] [I] Plugins:
[07/30/2021-08:49:24] [I] === Inference Options ===
[07/30/2021-08:49:24] [I] Batch: 1
[07/30/2021-08:49:24] [I] Input inference shapes: model
[07/30/2021-08:49:24] [I] Iterations: 10
[07/30/2021-08:49:24] [I] Duration: 3s (+ 200ms warm up)
[07/30/2021-08:49:24] [I] Sleep time: 0ms
[07/30/2021-08:49:24] [I] Streams: 1
[07/30/2021-08:49:24] [I] ExposeDMA: Disabled
[07/30/2021-08:49:24] [I] Spin-wait: Disabled
[07/30/2021-08:49:24] [I] Multithreading: Disabled
[07/30/2021-08:49:24] [I] CUDA Graph: Disabled
[07/30/2021-08:49:24] [I] Skip inference: Disabled
[07/30/2021-08:49:24] [I] Inputs:
[07/30/2021-08:49:24] [I] === Reporting Options ===
[07/30/2021-08:49:24] [I] Verbose: Disabled
[07/30/2021-08:49:24] [I] Averages: 10 inferences
[07/30/2021-08:49:24] [I] Percentile: 99
[07/30/2021-08:49:24] [I] Dump output: Disabled
[07/30/2021-08:49:24] [I] Profile: Disabled
[07/30/2021-08:49:24] [I] Export timing to JSON file:
[07/30/2021-08:49:24] [I] Export output to JSON file:
[07/30/2021-08:49:24] [I] Export profile to JSON file:
[07/30/2021-08:49:24] [I]
[07/30/2021-08:49:29] [I] Starting inference threads
[07/30/2021-08:49:33] [I] Warmup completed 87 queries over 200 ms
[07/30/2021-08:49:33] [I] Timing trace has 1339 queries over 3.00464 s
[07/30/2021-08:49:33] [I] Trace averages of 10 runs:
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.11544 ms - Host latency: 2.16028 ms (end to end 2.17235 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.12184 ms - Host latency: 2.1667 ms (end to end 2.17881 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.13178 ms - Host latency: 2.17709 ms (end to end 2.18871 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.14521 ms - Host latency: 2.19062 ms (end to end 2.20093 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.1476 ms - Host latency: 2.19301 ms (end to end 2.20551 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.15789 ms - Host latency: 2.20334 ms (end to end 2.21478 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16225 ms - Host latency: 2.20853 ms (end to end 2.22051 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.1648 ms - Host latency: 2.21057 ms (end to end 2.22222 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16377 ms - Host latency: 2.20956 ms (end to end 2.22211 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.1611 ms - Host latency: 2.20697 ms (end to end 2.21986 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16052 ms - Host latency: 2.20619 ms (end to end 2.21777 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16098 ms - Host latency: 2.207 ms (end to end 2.22067 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16579 ms - Host latency: 2.21096 ms (end to end 2.2228 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.15862 ms - Host latency: 2.20458 ms (end to end 2.21628 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16154 ms - Host latency: 2.20732 ms (end to end 2.21806 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.15565 ms - Host latency: 2.20139 ms (end to end 2.21336 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.15636 ms - Host latency: 2.20206 ms (end to end 2.21395 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16128 ms - Host latency: 2.20654 ms (end to end 2.21641 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.15681 ms - Host latency: 2.20228 ms (end to end 2.21458 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.15871 ms - Host latency: 2.20422 ms (end to end 2.21673 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.15793 ms - Host latency: 2.20289 ms (end to end 2.21622 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.15558 ms - Host latency: 2.2002 ms (end to end 2.2117 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16296 ms - Host latency: 2.20804 ms (end to end 2.21979 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16564 ms - Host latency: 2.21076 ms (end to end 2.22324 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.1694 ms - Host latency: 2.2157 ms (end to end 2.22673 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16448 ms - Host latency: 2.2103 ms (end to end 2.22037 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.18069 ms - Host latency: 2.22719 ms (end to end 2.23926 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.17237 ms - Host latency: 2.21835 ms (end to end 2.22933 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.18719 ms - Host latency: 2.23296 ms (end to end 2.24502 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.17628 ms - Host latency: 2.22217 ms (end to end 2.23362 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.17028 ms - Host latency: 2.21598 ms (end to end 2.22582 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.17301 ms - Host latency: 2.2187 ms (end to end 2.22939 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.17506 ms - Host latency: 2.22059 ms (end to end 2.23088 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16821 ms - Host latency: 2.21437 ms (end to end 2.22484 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16369 ms - Host latency: 2.20912 ms (end to end 2.22263 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.15341 ms - Host latency: 2.1998 ms (end to end 2.21141 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16579 ms - Host latency: 2.21182 ms (end to end 2.22325 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.15301 ms - Host latency: 2.19835 ms (end to end 2.20998 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.14689 ms - Host latency: 2.19355 ms (end to end 2.2046 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.1443 ms - Host latency: 2.19008 ms (end to end 2.20149 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16637 ms - Host latency: 2.2123 ms (end to end 2.22415 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16907 ms - Host latency: 2.21528 ms (end to end 2.22676 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16416 ms - Host latency: 2.21013 ms (end to end 2.22057 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16155 ms - Host latency: 2.20708 ms (end to end 2.21947 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.167 ms - Host latency: 2.21285 ms (end to end 2.22421 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.18108 ms - Host latency: 2.22747 ms (end to end 2.23737 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.17992 ms - Host latency: 2.22709 ms (end to end 2.23892 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16884 ms - Host latency: 2.21511 ms (end to end 2.22673 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16448 ms - Host latency: 2.21058 ms (end to end 2.22196 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.17528 ms - Host latency: 2.22202 ms (end to end 2.23253 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.18813 ms - Host latency: 2.23398 ms (end to end 2.24542 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19164 ms - Host latency: 2.23835 ms (end to end 2.2478 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.1683 ms - Host latency: 2.21472 ms (end to end 2.22644 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.16997 ms - Host latency: 2.21643 ms (end to end 2.22771 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.1894 ms - Host latency: 2.23615 ms (end to end 2.24757 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.1905 ms - Host latency: 2.23633 ms (end to end 2.24668 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19286 ms - Host latency: 2.23851 ms (end to end 2.24996 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19336 ms - Host latency: 2.23878 ms (end to end 2.25023 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19197 ms - Host latency: 2.23917 ms (end to end 2.24995 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19191 ms - Host latency: 2.23827 ms (end to end 2.24941 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20983 ms - Host latency: 2.25662 ms (end to end 2.26644 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20533 ms - Host latency: 2.25127 ms (end to end 2.2618 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19998 ms - Host latency: 2.24679 ms (end to end 2.2559 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20563 ms - Host latency: 2.25272 ms (end to end 2.26429 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20286 ms - Host latency: 2.24969 ms (end to end 2.26099 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20311 ms - Host latency: 2.24978 ms (end to end 2.25985 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19941 ms - Host latency: 2.24532 ms (end to end 2.25654 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20267 ms - Host latency: 2.24901 ms (end to end 2.25906 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20292 ms - Host latency: 2.24969 ms (end to end 2.26083 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.2031 ms - Host latency: 2.24963 ms (end to end 2.2608 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20164 ms - Host latency: 2.24822 ms (end to end 2.25992 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20686 ms - Host latency: 2.25325 ms (end to end 2.26414 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19512 ms - Host latency: 2.24139 ms (end to end 2.2521 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20593 ms - Host latency: 2.2522 ms (end to end 2.2635 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.21053 ms - Host latency: 2.25751 ms (end to end 2.2672 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20928 ms - Host latency: 2.25608 ms (end to end 2.26781 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.21147 ms - Host latency: 2.25728 ms (end to end 2.26953 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.2024 ms - Host latency: 2.24919 ms (end to end 2.25952 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20304 ms - Host latency: 2.2499 ms (end to end 2.26107 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.21083 ms - Host latency: 2.25771 ms (end to end 2.26996 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.22065 ms - Host latency: 2.26703 ms (end to end 2.27889 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.22467 ms - Host latency: 2.27129 ms (end to end 2.28185 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.21489 ms - Host latency: 2.26135 ms (end to end 2.27135 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.21841 ms - Host latency: 2.26497 ms (end to end 2.27683 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20825 ms - Host latency: 2.25535 ms (end to end 2.26511 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.2145 ms - Host latency: 2.26147 ms (end to end 2.27351 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.21843 ms - Host latency: 2.26567 ms (end to end 2.27629 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.2189 ms - Host latency: 2.26558 ms (end to end 2.27595 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.2186 ms - Host latency: 2.26577 ms (end to end 2.27661 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.22576 ms - Host latency: 2.27249 ms (end to end 2.28242 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.21577 ms - Host latency: 2.26289 ms (end to end 2.27375 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.21895 ms - Host latency: 2.26584 ms (end to end 2.278 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.21389 ms - Host latency: 2.26023 ms (end to end 2.27288 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20461 ms - Host latency: 2.25034 ms (end to end 2.26194 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.2124 ms - Host latency: 2.25933 ms (end to end 2.26809 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.21421 ms - Host latency: 2.26096 ms (end to end 2.272 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20254 ms - Host latency: 2.24934 ms (end to end 2.26138 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.21323 ms - Host latency: 2.26055 ms (end to end 2.26929 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19814 ms - Host latency: 2.24446 ms (end to end 2.25659 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20293 ms - Host latency: 2.2491 ms (end to end 2.26055 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20188 ms - Host latency: 2.2477 ms (end to end 2.25979 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.21379 ms - Host latency: 2.2604 ms (end to end 2.27158 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20164 ms - Host latency: 2.2481 ms (end to end 2.25962 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20181 ms - Host latency: 2.24778 ms (end to end 2.25964 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20583 ms - Host latency: 2.25227 ms (end to end 2.26421 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19915 ms - Host latency: 2.24563 ms (end to end 2.25627 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20469 ms - Host latency: 2.25095 ms (end to end 2.26191 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19802 ms - Host latency: 2.24478 ms (end to end 2.25662 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20879 ms - Host latency: 2.25503 ms (end to end 2.26626 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20801 ms - Host latency: 2.25415 ms (end to end 2.26665 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19438 ms - Host latency: 2.24094 ms (end to end 2.25168 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.21272 ms - Host latency: 2.2593 ms (end to end 2.27053 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.2093 ms - Host latency: 2.25549 ms (end to end 2.26648 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20984 ms - Host latency: 2.25666 ms (end to end 2.26736 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20557 ms - Host latency: 2.252 ms (end to end 2.26365 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.20508 ms - Host latency: 2.2519 ms (end to end 2.2637 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.1946 ms - Host latency: 2.24104 ms (end to end 2.25159 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19097 ms - Host latency: 2.23696 ms (end to end 2.24912 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.1866 ms - Host latency: 2.23313 ms (end to end 2.24558 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.18181 ms - Host latency: 2.22852 ms (end to end 2.23857 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.17683 ms - Host latency: 2.22253 ms (end to end 2.23374 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.18284 ms - Host latency: 2.22869 ms (end to end 2.23953 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.1905 ms - Host latency: 2.23638 ms (end to end 2.24565 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.18772 ms - Host latency: 2.23416 ms (end to end 2.24656 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.18691 ms - Host latency: 2.23325 ms (end to end 2.24592 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19224 ms - Host latency: 2.23818 ms (end to end 2.24895 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.18682 ms - Host latency: 2.23298 ms (end to end 2.24495 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.18411 ms - Host latency: 2.23118 ms (end to end 2.24141 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19048 ms - Host latency: 2.23672 ms (end to end 2.24878 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.1886 ms - Host latency: 2.23472 ms (end to end 2.24644 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.18647 ms - Host latency: 2.23284 ms (end to end 2.24531 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.19099 ms - Host latency: 2.23704 ms (end to end 2.24993 ms)
[07/30/2021-08:49:33] [I] Average on 10 runs - GPU latency: 2.18704 ms - Host latency: 2.23386 ms (end to end 2.24631 ms)
[07/30/2021-08:49:33] [I] Host latency
[07/30/2021-08:49:33] [I] min: 2.14438 ms (end to end 2.15494 ms)
[07/30/2021-08:49:33] [I] max: 2.29395 ms (end to end 2.30835 ms)
[07/30/2021-08:49:33] [I] mean: 2.23257 ms (end to end 2.24389 ms)
[07/30/2021-08:49:33] [I] median: 2.23535 ms (end to end 2.24707 ms)
[07/30/2021-08:49:33] [I] percentile: 2.28223 ms at 99% (end to end 2.29321 ms at 99%)
[07/30/2021-08:49:33] [I] throughput: 445.645 qps
[07/30/2021-08:49:33] [I] walltime: 3.00464 s
[07/30/2021-08:49:33] [I] GPU Compute
[07/30/2021-08:49:33] [I] min: 2.09972 ms
[07/30/2021-08:49:33] [I] max: 2.24768 ms
[07/30/2021-08:49:33] [I] mean: 2.18636 ms
[07/30/2021-08:49:33] [I] median: 2.18921 ms
[07/30/2021-08:49:33] [I] percentile: 2.23486 ms at 99%
[07/30/2021-08:49:33] [I] total compute time: 2.92753 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=/home/user/gazebo_dev/computer-vision-varun/models/mobilenet-v1-ssd/mobilenet-v1-ssd-trtexec-fp32.trt --batch=1

The logs for Float 16 implementation

user@localhost:~$ /usr/src/tensorrt/bin/trtexec --loadEngine=/home/user/gazebo_dev/computer-vision-varun/models/mobilenet-v1-ssd/mobilenet-v1-ssd-trtexec-fp16.trt --batch=1 --fp16
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=/home/user/gazebo_dev/computer-vision-varun/models/mobilenet-v1-ssd/mobilenet-v1-ssd-trtexec-fp16.trt --batch=1 --fp16
[07/30/2021-08:55:21] [I] === Model Options ===
[07/30/2021-08:55:21] [I] Format: *
[07/30/2021-08:55:21] [I] Model:
[07/30/2021-08:55:21] [I] Output:
[07/30/2021-08:55:21] [I] === Build Options ===
[07/30/2021-08:55:21] [I] Max batch: 1
[07/30/2021-08:55:21] [I] Workspace: 16 MB
[07/30/2021-08:55:21] [I] minTiming: 1
[07/30/2021-08:55:21] [I] avgTiming: 8
[07/30/2021-08:55:21] [I] Precision: FP32+FP16
[07/30/2021-08:55:21] [I] Calibration:
[07/30/2021-08:55:21] [I] Safe mode: Disabled
[07/30/2021-08:55:21] [I] Save engine:
[07/30/2021-08:55:21] [I] Load engine: /home/user/gazebo_dev/computer-vision-varun/models/mobilenet-v1-ssd/mobilenet-v1-ssd-trtexec-fp16.trt
[07/30/2021-08:55:21] [I] Builder Cache: Enabled
[07/30/2021-08:55:21] [I] NVTX verbosity: 0
[07/30/2021-08:55:21] [I] Inputs format: fp32:CHW
[07/30/2021-08:55:21] [I] Outputs format: fp32:CHW
[07/30/2021-08:55:21] [I] Input build shapes: model
[07/30/2021-08:55:21] [I] Input calibration shapes: model
[07/30/2021-08:55:21] [I] === System Options ===
[07/30/2021-08:55:21] [I] Device: 0
[07/30/2021-08:55:21] [I] DLACore:
[07/30/2021-08:55:21] [I] Plugins:
[07/30/2021-08:55:21] [I] === Inference Options ===
[07/30/2021-08:55:21] [I] Batch: 1
[07/30/2021-08:55:21] [I] Input inference shapes: model
[07/30/2021-08:55:21] [I] Iterations: 10
[07/30/2021-08:55:21] [I] Duration: 3s (+ 200ms warm up)
[07/30/2021-08:55:21] [I] Sleep time: 0ms
[07/30/2021-08:55:21] [I] Streams: 1
[07/30/2021-08:55:21] [I] ExposeDMA: Disabled
[07/30/2021-08:55:21] [I] Spin-wait: Disabled
[07/30/2021-08:55:21] [I] Multithreading: Disabled
[07/30/2021-08:55:21] [I] CUDA Graph: Disabled
[07/30/2021-08:55:21] [I] Skip inference: Disabled
[07/30/2021-08:55:21] [I] Inputs:
[07/30/2021-08:55:21] [I] === Reporting Options ===
[07/30/2021-08:55:21] [I] Verbose: Disabled
[07/30/2021-08:55:21] [I] Averages: 10 inferences
[07/30/2021-08:55:21] [I] Percentile: 99
[07/30/2021-08:55:21] [I] Dump output: Disabled
[07/30/2021-08:55:21] [I] Profile: Disabled
[07/30/2021-08:55:21] [I] Export timing to JSON file:
[07/30/2021-08:55:21] [I] Export output to JSON file:
[07/30/2021-08:55:21] [I] Export profile to JSON file:
[07/30/2021-08:55:21] [I]
[07/30/2021-08:55:24] [I] Starting inference threads
[07/30/2021-08:55:27] [I] Warmup completed 91 queries over 200 ms
[07/30/2021-08:55:27] [I] Timing trace has 1356 queries over 3.00441 s
[07/30/2021-08:55:27] [I] Trace averages of 10 runs:
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16525 ms - Host latency: 2.19185 ms (end to end 2.20227 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16509 ms - Host latency: 2.19171 ms (end to end 2.20202 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16518 ms - Host latency: 2.19182 ms (end to end 2.20148 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16947 ms - Host latency: 2.19591 ms (end to end 2.20767 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16855 ms - Host latency: 2.19564 ms (end to end 2.20587 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16587 ms - Host latency: 2.19278 ms (end to end 2.20215 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16514 ms - Host latency: 2.19189 ms (end to end 2.20081 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16605 ms - Host latency: 2.19273 ms (end to end 2.20462 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16494 ms - Host latency: 2.1915 ms (end to end 2.20122 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16469 ms - Host latency: 2.19139 ms (end to end 2.20131 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16974 ms - Host latency: 2.19654 ms (end to end 2.20641 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16594 ms - Host latency: 2.19295 ms (end to end 2.20127 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16625 ms - Host latency: 2.19297 ms (end to end 2.20381 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16931 ms - Host latency: 2.19607 ms (end to end 2.20499 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17002 ms - Host latency: 2.19637 ms (end to end 2.20717 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17057 ms - Host latency: 2.19734 ms (end to end 2.20871 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16459 ms - Host latency: 2.19113 ms (end to end 2.20158 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.1642 ms - Host latency: 2.19146 ms (end to end 2.20236 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17012 ms - Host latency: 2.19695 ms (end to end 2.20837 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17102 ms - Host latency: 2.19739 ms (end to end 2.20986 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16427 ms - Host latency: 2.19121 ms (end to end 2.20164 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.1745 ms - Host latency: 2.20121 ms (end to end 2.21181 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16921 ms - Host latency: 2.19611 ms (end to end 2.20744 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17414 ms - Host latency: 2.20084 ms (end to end 2.21309 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17386 ms - Host latency: 2.20071 ms (end to end 2.21242 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17038 ms - Host latency: 2.19692 ms (end to end 2.20811 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16345 ms - Host latency: 2.19034 ms (end to end 2.20065 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16647 ms - Host latency: 2.19307 ms (end to end 2.20422 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16168 ms - Host latency: 2.18851 ms (end to end 2.19946 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.1638 ms - Host latency: 2.19073 ms (end to end 2.2024 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.1671 ms - Host latency: 2.19402 ms (end to end 2.20463 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16783 ms - Host latency: 2.19449 ms (end to end 2.20577 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16491 ms - Host latency: 2.19227 ms (end to end 2.20327 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.1651 ms - Host latency: 2.19196 ms (end to end 2.20131 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.1673 ms - Host latency: 2.19431 ms (end to end 2.20623 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17244 ms - Host latency: 2.19933 ms (end to end 2.2106 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.1647 ms - Host latency: 2.19146 ms (end to end 2.20227 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16448 ms - Host latency: 2.19131 ms (end to end 2.20223 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16409 ms - Host latency: 2.19071 ms (end to end 2.20094 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.1662 ms - Host latency: 2.19288 ms (end to end 2.20309 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16443 ms - Host latency: 2.19094 ms (end to end 2.20111 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16481 ms - Host latency: 2.19167 ms (end to end 2.20189 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16653 ms - Host latency: 2.19297 ms (end to end 2.20243 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16674 ms - Host latency: 2.19377 ms (end to end 2.20356 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16455 ms - Host latency: 2.19166 ms (end to end 2.20222 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16665 ms - Host latency: 2.19374 ms (end to end 2.20402 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17341 ms - Host latency: 2.19987 ms (end to end 2.21078 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17118 ms - Host latency: 2.19783 ms (end to end 2.20824 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17001 ms - Host latency: 2.19751 ms (end to end 2.20754 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16909 ms - Host latency: 2.19601 ms (end to end 2.20619 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16992 ms - Host latency: 2.19727 ms (end to end 2.20682 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17599 ms - Host latency: 2.20251 ms (end to end 2.21309 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17036 ms - Host latency: 2.19698 ms (end to end 2.20826 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17542 ms - Host latency: 2.20273 ms (end to end 2.21289 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17683 ms - Host latency: 2.20398 ms (end to end 2.21387 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17825 ms - Host latency: 2.20553 ms (end to end 2.21476 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18484 ms - Host latency: 2.21155 ms (end to end 2.22203 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.1843 ms - Host latency: 2.2118 ms (end to end 2.22184 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18875 ms - Host latency: 2.21548 ms (end to end 2.22594 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.1903 ms - Host latency: 2.21766 ms (end to end 2.22792 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18284 ms - Host latency: 2.21017 ms (end to end 2.22161 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18656 ms - Host latency: 2.21371 ms (end to end 2.22422 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18448 ms - Host latency: 2.21118 ms (end to end 2.21921 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18623 ms - Host latency: 2.21296 ms (end to end 2.2246 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18461 ms - Host latency: 2.21147 ms (end to end 2.22229 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18507 ms - Host latency: 2.21287 ms (end to end 2.22261 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18982 ms - Host latency: 2.21674 ms (end to end 2.22758 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18865 ms - Host latency: 2.21533 ms (end to end 2.22625 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18049 ms - Host latency: 2.20706 ms (end to end 2.21611 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18463 ms - Host latency: 2.21209 ms (end to end 2.22158 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18522 ms - Host latency: 2.21215 ms (end to end 2.22324 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19334 ms - Host latency: 2.22084 ms (end to end 2.23093 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19166 ms - Host latency: 2.21864 ms (end to end 2.2288 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17471 ms - Host latency: 2.20146 ms (end to end 2.21279 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18348 ms - Host latency: 2.21041 ms (end to end 2.22081 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19675 ms - Host latency: 2.22391 ms (end to end 2.23352 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18964 ms - Host latency: 2.21678 ms (end to end 2.22716 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18762 ms - Host latency: 2.21476 ms (end to end 2.22515 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19545 ms - Host latency: 2.22219 ms (end to end 2.23207 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18727 ms - Host latency: 2.21416 ms (end to end 2.22532 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17413 ms - Host latency: 2.20101 ms (end to end 2.21161 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.166 ms - Host latency: 2.19303 ms (end to end 2.20314 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17244 ms - Host latency: 2.19888 ms (end to end 2.20935 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16599 ms - Host latency: 2.19248 ms (end to end 2.20283 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17366 ms - Host latency: 2.20024 ms (end to end 2.21123 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16484 ms - Host latency: 2.19202 ms (end to end 2.20283 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16484 ms - Host latency: 2.19207 ms (end to end 2.20251 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16963 ms - Host latency: 2.19651 ms (end to end 2.20708 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16272 ms - Host latency: 2.18992 ms (end to end 2.19963 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16218 ms - Host latency: 2.18877 ms (end to end 2.19919 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17039 ms - Host latency: 2.19734 ms (end to end 2.2082 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16846 ms - Host latency: 2.19519 ms (end to end 2.20747 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16482 ms - Host latency: 2.19158 ms (end to end 2.2022 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.16887 ms - Host latency: 2.19531 ms (end to end 2.20627 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17346 ms - Host latency: 2.20044 ms (end to end 2.21018 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17024 ms - Host latency: 2.19746 ms (end to end 2.20662 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17871 ms - Host latency: 2.20522 ms (end to end 2.21577 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17356 ms - Host latency: 2.20051 ms (end to end 2.21111 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19053 ms - Host latency: 2.21792 ms (end to end 2.22878 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18394 ms - Host latency: 2.21108 ms (end to end 2.22214 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19141 ms - Host latency: 2.21841 ms (end to end 2.22859 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18499 ms - Host latency: 2.21177 ms (end to end 2.22217 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19351 ms - Host latency: 2.22048 ms (end to end 2.22986 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19719 ms - Host latency: 2.22417 ms (end to end 2.23452 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.1949 ms - Host latency: 2.22205 ms (end to end 2.23162 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19827 ms - Host latency: 2.22571 ms (end to end 2.2355 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19866 ms - Host latency: 2.22524 ms (end to end 2.23459 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18987 ms - Host latency: 2.21667 ms (end to end 2.22729 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.1908 ms - Host latency: 2.21831 ms (end to end 2.22896 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.20349 ms - Host latency: 2.23037 ms (end to end 2.24143 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19282 ms - Host latency: 2.22026 ms (end to end 2.23098 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.17969 ms - Host latency: 2.20667 ms (end to end 2.21792 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18818 ms - Host latency: 2.21531 ms (end to end 2.22622 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18743 ms - Host latency: 2.21416 ms (end to end 2.22432 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19431 ms - Host latency: 2.22134 ms (end to end 2.23274 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18777 ms - Host latency: 2.21458 ms (end to end 2.22498 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.1895 ms - Host latency: 2.21619 ms (end to end 2.22729 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19487 ms - Host latency: 2.2218 ms (end to end 2.23218 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19041 ms - Host latency: 2.21758 ms (end to end 2.22825 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19404 ms - Host latency: 2.22105 ms (end to end 2.23206 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19097 ms - Host latency: 2.21809 ms (end to end 2.22964 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19138 ms - Host latency: 2.21877 ms (end to end 2.22886 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19827 ms - Host latency: 2.22573 ms (end to end 2.23508 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19143 ms - Host latency: 2.21858 ms (end to end 2.23052 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19407 ms - Host latency: 2.22129 ms (end to end 2.2314 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18989 ms - Host latency: 2.21643 ms (end to end 2.22781 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19414 ms - Host latency: 2.22131 ms (end to end 2.23074 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18682 ms - Host latency: 2.21357 ms (end to end 2.22512 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18982 ms - Host latency: 2.21675 ms (end to end 2.22646 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19534 ms - Host latency: 2.22246 ms (end to end 2.23208 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19995 ms - Host latency: 2.22737 ms (end to end 2.23708 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19495 ms - Host latency: 2.22187 ms (end to end 2.23181 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19094 ms - Host latency: 2.21814 ms (end to end 2.22869 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.19221 ms - Host latency: 2.21919 ms (end to end 2.22976 ms)
[07/30/2021-08:55:27] [I] Average on 10 runs - GPU latency: 2.18867 ms - Host latency: 2.21592 ms (end to end 2.22561 ms)
[07/30/2021-08:55:27] [I] Host latency
[07/30/2021-08:55:27] [I] min: 2.15405 ms (end to end 2.16724 ms)
[07/30/2021-08:55:27] [I] max: 2.25122 ms (end to end 2.26758 ms)
[07/30/2021-08:55:27] [I] mean: 2.20512 ms (end to end 2.21559 ms)
[07/30/2021-08:55:27] [I] median: 2.20227 ms (end to end 2.21289 ms)
[07/30/2021-08:55:27] [I] percentile: 2.24365 ms at 99% (end to end 2.25464 ms at 99%)
[07/30/2021-08:55:27] [I] throughput: 451.336 qps
[07/30/2021-08:55:27] [I] walltime: 3.00441 s
[07/30/2021-08:55:27] [I] GPU Compute
[07/30/2021-08:55:27] [I] min: 2.12842 ms
[07/30/2021-08:55:27] [I] max: 2.22522 ms
[07/30/2021-08:55:27] [I] mean: 2.17819 ms
[07/30/2021-08:55:27] [I] median: 2.17531 ms
[07/30/2021-08:55:27] [I] percentile: 2.21655 ms at 99%
[07/30/2021-08:55:27] [I] total compute time: 2.95363 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=/home/user/gazebo_dev/computer-vision-varun/models/mobilenet-v1-ssd/mobilenet-v1-ssd-trtexec-fp16.trt --batch=1 --fp16

My question is in the total compute time. They are very similar. Is it how its supposed to be? Also note that the float16 model was created with float16 data types from the PyTorch itself by using .half()

P.S.: The files in this case were actually generated using trtexec itself. The runtimes were similar to ones generated using the Python API.

GPU performance depends on a lot of things. With a 2ms running time, the model is not particularly large. Thus, it’s totally possible that you are bound on some other bottleneck, like CPU/GPU synchronization, or even just timing.

fp16 runs faster than fp32 if you are compute bound on the ALUs or the memory streaming to/from those ALUs, and if the model doesn’t end up with higher data conversion costs because some layer/data/source/buffer is still 32-bit.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.