&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --loadEngine=test.engine --fp16
[02/10/2023-10:13:03] [I] === Model Options ===
[02/10/2023-10:13:03] [I] Format: *
[02/10/2023-10:13:03] [I] Model: 
[02/10/2023-10:13:03] [I] Output:
[02/10/2023-10:13:03] [I] === Build Options ===
[02/10/2023-10:13:03] [I] Max batch: 1
[02/10/2023-10:13:03] [I] Workspace: 16 MiB
[02/10/2023-10:13:03] [I] minTiming: 1
[02/10/2023-10:13:03] [I] avgTiming: 8
[02/10/2023-10:13:03] [I] Precision: FP32+FP16
[02/10/2023-10:13:03] [I] Calibration: 
[02/10/2023-10:13:03] [I] Refit: Disabled
[02/10/2023-10:13:03] [I] Sparsity: Disabled
[02/10/2023-10:13:03] [I] Safe mode: Disabled
[02/10/2023-10:13:03] [I] Restricted mode: Disabled
[02/10/2023-10:13:03] [I] Save engine: 
[02/10/2023-10:13:03] [I] Load engine: test.engine
[02/10/2023-10:13:03] [I] NVTX verbosity: 0
[02/10/2023-10:13:03] [I] Tactic sources: Using default tactic sources
[02/10/2023-10:13:03] [I] timingCacheMode: local
[02/10/2023-10:13:03] [I] timingCacheFile: 
[02/10/2023-10:13:03] [I] Input(s)s format: fp32:CHW
[02/10/2023-10:13:03] [I] Output(s)s format: fp32:CHW
[02/10/2023-10:13:03] [I] Input build shapes: model
[02/10/2023-10:13:03] [I] Input calibration shapes: model
[02/10/2023-10:13:03] [I] === System Options ===
[02/10/2023-10:13:03] [I] Device: 0
[02/10/2023-10:13:03] [I] DLACore: 
[02/10/2023-10:13:03] [I] Plugins:
[02/10/2023-10:13:03] [I] === Inference Options ===
[02/10/2023-10:13:03] [I] Batch: 1
[02/10/2023-10:13:03] [I] Input inference shapes: model
[02/10/2023-10:13:03] [I] Iterations: 10
[02/10/2023-10:13:03] [I] Duration: 3s (+ 200ms warm up)
[02/10/2023-10:13:03] [I] Sleep time: 0ms
[02/10/2023-10:13:03] [I] Streams: 1
[02/10/2023-10:13:03] [I] ExposeDMA: Disabled
[02/10/2023-10:13:03] [I] Data transfers: Enabled
[02/10/2023-10:13:03] [I] Spin-wait: Disabled
[02/10/2023-10:13:03] [I] Multithreading: Disabled
[02/10/2023-10:13:03] [I] CUDA Graph: Disabled
[02/10/2023-10:13:03] [I] Separate profiling: Disabled
[02/10/2023-10:13:03] [I] Time Deserialize: Disabled
[02/10/2023-10:13:03] [I] Time Refit: Disabled
[02/10/2023-10:13:03] [I] Skip inference: Disabled
[02/10/2023-10:13:03] [I] Inputs:
[02/10/2023-10:13:03] [I] === Reporting Options ===
[02/10/2023-10:13:03] [I] Verbose: Disabled
[02/10/2023-10:13:03] [I] Averages: 10 inferences
[02/10/2023-10:13:03] [I] Percentile: 99
[02/10/2023-10:13:03] [I] Dump refittable layers:Disabled
[02/10/2023-10:13:03] [I] Dump output: Disabled
[02/10/2023-10:13:03] [I] Profile: Disabled
[02/10/2023-10:13:03] [I] Export timing to JSON file: 
[02/10/2023-10:13:03] [I] Export output to JSON file: 
[02/10/2023-10:13:03] [I] Export profile to JSON file: 
[02/10/2023-10:13:03] [I] 
[02/10/2023-10:13:03] [I] === Device Information ===
[02/10/2023-10:13:03] [I] Selected Device: Xavier
[02/10/2023-10:13:03] [I] Compute Capability: 7.2
[02/10/2023-10:13:03] [I] SMs: 8
[02/10/2023-10:13:03] [I] Compute Clock Rate: 1.377 GHz
[02/10/2023-10:13:03] [I] Device Global Memory: 31920 MiB
[02/10/2023-10:13:03] [I] Shared Memory per SM: 96 KiB
[02/10/2023-10:13:03] [I] Memory Bus Width: 256 bits (ECC disabled)
[02/10/2023-10:13:03] [I] Memory Clock Rate: 1.377 GHz
[02/10/2023-10:13:03] [I] 
[02/10/2023-10:13:03] [I] TensorRT version: 8001
[02/10/2023-10:13:04] [I] [TRT] [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 393, GPU 26558 (MiB)
[02/10/2023-10:13:04] [I] [TRT] Loaded engine size: 21 MB
[02/10/2023-10:13:04] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 393 MiB, GPU 26558 MiB
[02/10/2023-10:13:05] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +227, GPU +292, now: CPU 620, GPU 26871 (MiB)
[02/10/2023-10:13:06] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +406, now: CPU 927, GPU 27277 (MiB)
[02/10/2023-10:13:06] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 927, GPU 27259 (MiB)
[02/10/2023-10:13:06] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 927 MiB, GPU 27259 MiB
[02/10/2023-10:13:06] [I] Engine loaded in 3.52383 sec.
[02/10/2023-10:13:06] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 905 MiB, GPU 27237 MiB
[02/10/2023-10:13:06] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +9, now: CPU 905, GPU 27246 (MiB)
[02/10/2023-10:13:06] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 905, GPU 27256 (MiB)
[02/10/2023-10:13:06] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 905 MiB, GPU 27296 MiB
[02/10/2023-10:13:06] [I] Created input binding for images with dimensions 8x3x224x224
[02/10/2023-10:13:06] [I] Created output binding for output with dimensions 8x178
[02/10/2023-10:13:06] [I] Starting inference
[02/10/2023-10:13:10] [I] Warmup completed 21 queries over 200 ms
[02/10/2023-10:13:10] [I] Timing trace has 330 queries over 3.0203 s
[02/10/2023-10:13:10] [I] 
[02/10/2023-10:13:10] [I] === Trace details ===
[02/10/2023-10:13:10] [I] Trace averages of 10 runs:
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 8.27648 ms - Host latency: 8.4515 ms (end to end 8.99732 ms, enqueue 1.31788 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 9.01109 ms - Host latency: 9.18951 ms (end to end 9.88801 ms, enqueue 1.3698 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 5.93416 ms - Host latency: 6.08813 ms (end to end 6.20313 ms, enqueue 1.10374 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 8.19895 ms - Host latency: 8.36575 ms (end to end 8.79359 ms, enqueue 1.17054 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 8.37682 ms - Host latency: 8.53567 ms (end to end 9.00415 ms, enqueue 0.9242 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 10.5015 ms - Host latency: 10.6934 ms (end to end 11.6115 ms, enqueue 1.34518 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 7.70841 ms - Host latency: 7.87197 ms (end to end 8.22097 ms, enqueue 1.36581 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 9.11565 ms - Host latency: 9.29042 ms (end to end 9.86268 ms, enqueue 1.33974 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 9.42933 ms - Host latency: 9.60031 ms (end to end 10.1954 ms, enqueue 1.32951 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 7.22375 ms - Host latency: 7.38429 ms (end to end 7.66831 ms, enqueue 1.16653 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 10.0143 ms - Host latency: 10.1932 ms (end to end 10.8305 ms, enqueue 1.03258 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 8.25688 ms - Host latency: 8.42316 ms (end to end 8.92919 ms, enqueue 1.29407 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 8.37133 ms - Host latency: 8.5412 ms (end to end 9.1474 ms, enqueue 1.12433 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 8.38286 ms - Host latency: 8.54239 ms (end to end 8.99419 ms, enqueue 1.35699 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 8.06625 ms - Host latency: 8.22335 ms (end to end 8.57782 ms, enqueue 0.977612 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 9.21318 ms - Host latency: 9.38519 ms (end to end 9.87018 ms, enqueue 0.966064 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 7.85547 ms - Host latency: 8.01921 ms (end to end 8.39232 ms, enqueue 1.40781 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 7.88301 ms - Host latency: 8.04799 ms (end to end 8.36318 ms, enqueue 1.36058 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 8.97675 ms - Host latency: 9.14863 ms (end to end 9.72443 ms, enqueue 0.983142 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 9.03546 ms - Host latency: 9.20564 ms (end to end 9.72032 ms, enqueue 0.935388 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 7.86847 ms - Host latency: 8.03019 ms (end to end 8.57872 ms, enqueue 1.02579 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 9.86235 ms - Host latency: 10.0416 ms (end to end 10.8059 ms, enqueue 0.910327 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 9.00989 ms - Host latency: 9.17915 ms (end to end 9.7394 ms, enqueue 1.64666 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 7.98123 ms - Host latency: 8.14624 ms (end to end 8.6022 ms, enqueue 1.55574 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 6.51899 ms - Host latency: 6.67722 ms (end to end 6.80679 ms, enqueue 1.25439 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 10.1417 ms - Host latency: 10.3187 ms (end to end 11.0511 ms, enqueue 0.960962 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 5.89065 ms - Host latency: 6.04265 ms (end to end 6.13499 ms, enqueue 0.9323 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 8.82603 ms - Host latency: 8.99231 ms (end to end 9.50395 ms, enqueue 0.93147 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 8.55535 ms - Host latency: 8.72153 ms (end to end 9.21543 ms, enqueue 1.2731 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 9.29724 ms - Host latency: 9.47727 ms (end to end 10.1013 ms, enqueue 1.08289 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 9.54048 ms - Host latency: 9.71787 ms (end to end 10.2106 ms, enqueue 1.08235 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 7.42056 ms - Host latency: 7.57996 ms (end to end 8.02444 ms, enqueue 1.75847 ms)
[02/10/2023-10:13:10] [I] Average on 10 runs - GPU latency: 9.2322 ms - Host latency: 9.40649 ms (end to end 10.0583 ms, enqueue 0.949414 ms)
[02/10/2023-10:13:10] [I] 
[02/10/2023-10:13:10] [I] === Performance summary ===
[02/10/2023-10:13:10] [I] Throughput: 109.261 qps
[02/10/2023-10:13:10] [I] Latency: min = 5.21619 ms, max = 12.7375 ms, mean = 8.65249 ms, median = 8.21936 ms, percentile(99%) = 12.6218 ms
[02/10/2023-10:13:10] [I] End-to-End Host Latency: min = 5.22742 ms, max = 13.8433 ms, mean = 9.14629 ms, median = 8.59398 ms, percentile(99%) = 13.7045 ms
[02/10/2023-10:13:10] [I] Enqueue Time: min = 0.647583 ms, max = 6.09204 ms, mean = 1.18895 ms, median = 0.976318 ms, percentile(99%) = 4.50946 ms
[02/10/2023-10:13:10] [I] H2D Latency: min = 0.131226 ms, max = 0.272705 ms, mean = 0.166317 ms, median = 0.162323 ms, percentile(99%) = 0.231689 ms
[02/10/2023-10:13:10] [I] GPU Compute Time: min = 5.08264 ms, max = 12.5224 ms, mean = 8.48415 ms, median = 8.05795 ms, percentile(99%) = 12.4302 ms
[02/10/2023-10:13:10] [I] D2H Latency: min = 0.0012207 ms, max = 0.00390625 ms, mean = 0.00202484 ms, median = 0.00195312 ms, percentile(99%) = 0.00338745 ms
[02/10/2023-10:13:10] [I] Total Host Walltime: 3.0203 s
[02/10/2023-10:13:10] [I] Total GPU Compute Time: 2.79977 s
[02/10/2023-10:13:10] [I] Explanations of the performance metrics are printed in the verbose logs.
[02/10/2023-10:13:10] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --loadEngine=test.engine --fp16
[02/10/2023-10:13:10] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 905, GPU 27256 (MiB)