&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --loadEngine=resnet50_sim_mod_GPU_fp16.trt --fp16 --dumpProfile [05/17/2022-15:34:08] [I] === Model Options === [05/17/2022-15:34:08] [I] Format: * [05/17/2022-15:34:08] [I] Model: [05/17/2022-15:34:08] [I] Output: [05/17/2022-15:34:08] [I] === Build Options === [05/17/2022-15:34:08] [I] Max batch: 1 [05/17/2022-15:34:08] [I] Workspace: 16 MiB [05/17/2022-15:34:08] [I] minTiming: 1 [05/17/2022-15:34:08] [I] avgTiming: 8 [05/17/2022-15:34:08] [I] Precision: FP32+FP16 [05/17/2022-15:34:08] [I] Calibration: [05/17/2022-15:34:08] [I] Refit: Disabled [05/17/2022-15:34:08] [I] Sparsity: Disabled [05/17/2022-15:34:08] [I] Safe mode: Disabled [05/17/2022-15:34:08] [I] DirectIO mode: Disabled [05/17/2022-15:34:08] [I] Restricted mode: Disabled [05/17/2022-15:34:08] [I] Save engine: [05/17/2022-15:34:08] [I] Load engine: resnet50_sim_mod_GPU_fp16.trt [05/17/2022-15:34:08] [I] Profiling verbosity: 0 [05/17/2022-15:34:08] [I] Tactic sources: Using default tactic sources [05/17/2022-15:34:08] [I] timingCacheMode: local [05/17/2022-15:34:08] [I] timingCacheFile: [05/17/2022-15:34:08] [I] Input(s)s format: fp32:CHW [05/17/2022-15:34:08] [I] Output(s)s format: fp32:CHW [05/17/2022-15:34:08] [I] Input build shapes: model [05/17/2022-15:34:08] [I] Input calibration shapes: model [05/17/2022-15:34:08] [I] === System Options === [05/17/2022-15:34:08] [I] Device: 0 [05/17/2022-15:34:08] [I] DLACore: [05/17/2022-15:34:08] [I] Plugins: [05/17/2022-15:34:08] [I] === Inference Options === [05/17/2022-15:34:08] [I] Batch: 1 [05/17/2022-15:34:08] [I] Input inference shapes: model [05/17/2022-15:34:08] [I] Iterations: 10 [05/17/2022-15:34:08] [I] Duration: 3s (+ 200ms warm up) [05/17/2022-15:34:08] [I] Sleep time: 0ms [05/17/2022-15:34:08] [I] Idle time: 0ms [05/17/2022-15:34:08] [I] Streams: 1 [05/17/2022-15:34:08] [I] ExposeDMA: Disabled [05/17/2022-15:34:08] [I] Data transfers: Enabled [05/17/2022-15:34:08] [I] Spin-wait: Disabled [05/17/2022-15:34:08] [I] Multithreading: Disabled [05/17/2022-15:34:08] [I] CUDA Graph: Disabled [05/17/2022-15:34:08] [I] Separate profiling: Disabled [05/17/2022-15:34:08] [I] Time Deserialize: Disabled [05/17/2022-15:34:08] [I] Time Refit: Disabled [05/17/2022-15:34:08] [I] Skip inference: Disabled [05/17/2022-15:34:08] [I] Inputs: [05/17/2022-15:34:08] [I] === Reporting Options === [05/17/2022-15:34:08] [I] Verbose: Disabled [05/17/2022-15:34:08] [I] Averages: 10 inferences [05/17/2022-15:34:08] [I] Percentile: 99 [05/17/2022-15:34:08] [I] Dump refittable layers:Disabled [05/17/2022-15:34:08] [I] Dump output: Disabled [05/17/2022-15:34:08] [I] Profile: Enabled [05/17/2022-15:34:08] [I] Export timing to JSON file: [05/17/2022-15:34:08] [I] Export output to JSON file: [05/17/2022-15:34:08] [I] Export profile to JSON file: [05/17/2022-15:34:08] [I] [05/17/2022-15:34:08] [I] === Device Information === [05/17/2022-15:34:08] [I] Selected Device: Xavier [05/17/2022-15:34:08] [I] Compute Capability: 7.2 [05/17/2022-15:34:08] [I] SMs: 8 [05/17/2022-15:34:08] [I] Compute Clock Rate: 1.377 GHz [05/17/2022-15:34:08] [I] Device Global Memory: 15824 MiB [05/17/2022-15:34:08] [I] Shared Memory per SM: 96 KiB [05/17/2022-15:34:08] [I] Memory Bus Width: 256 bits (ECC disabled) [05/17/2022-15:34:08] [I] Memory Clock Rate: 1.377 GHz [05/17/2022-15:34:08] [I] [05/17/2022-15:34:08] [I] TensorRT version: 8.2.1 [05/17/2022-15:34:09] [I] [TRT] [MemUsageChange] Init CUDA: CPU +362, GPU +0, now: CPU 428, GPU 3246 (MiB) [05/17/2022-15:34:09] [I] [TRT] Loaded engine size: 47 MiB [05/17/2022-15:34:09] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +228, now: CPU 659, GPU 3478 (MiB) [05/17/2022-15:34:10] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +309, now: CPU 966, GPU 3787 (MiB) [05/17/2022-15:34:10] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +44, now: CPU 0, GPU 44 (MiB) [05/17/2022-15:34:10] [I] Engine loaded in 2.46549 sec. [05/17/2022-15:34:10] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 919, GPU 3740 (MiB) [05/17/2022-15:34:10] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 919, GPU 3740 (MiB) [05/17/2022-15:34:10] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +49, now: CPU 0, GPU 93 (MiB) [05/17/2022-15:34:10] [I] Using random values for input input [05/17/2022-15:34:10] [I] Created input binding for input with dimensions 1x3x224x224 [05/17/2022-15:34:10] [I] Using random values for output output [05/17/2022-15:34:10] [I] Created output binding for output with dimensions 1x2048x7x7 [05/17/2022-15:34:10] [I] Starting inference [05/17/2022-15:34:13] [W] The network timing report will not be accurate due to extra synchronizations when profiler is enabled. [05/17/2022-15:34:13] [W] Add --separateProfileRun to profile layer timing in a separate run. [05/17/2022-15:34:13] [I] Warmup completed 67 queries over 200 ms [05/17/2022-15:34:13] [I] Timing trace has 1018 queries over 3.00559 s [05/17/2022-15:34:13] [I] [05/17/2022-15:34:13] [I] === Trace details === [05/17/2022-15:34:13] [I] Trace averages of 10 runs: [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.89926 ms - Host latency: 2.95748 ms (end to end 2.97431 ms, enqueue 2.90366 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86617 ms - Host latency: 2.92184 ms (end to end 2.94234 ms, enqueue 2.87395 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.90221 ms - Host latency: 2.96032 ms (end to end 2.97206 ms, enqueue 2.90306 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.90306 ms - Host latency: 2.96273 ms (end to end 2.9778 ms, enqueue 2.90804 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.9031 ms - Host latency: 2.96362 ms (end to end 2.97623 ms, enqueue 2.90467 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 3.02353 ms - Host latency: 3.07431 ms (end to end 3.09025 ms, enqueue 3.02488 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.88118 ms - Host latency: 2.93562 ms (end to end 2.95228 ms, enqueue 2.89065 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.88494 ms - Host latency: 2.93971 ms (end to end 2.95415 ms, enqueue 2.88701 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.8803 ms - Host latency: 2.93448 ms (end to end 2.95072 ms, enqueue 2.88735 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.88474 ms - Host latency: 2.9378 ms (end to end 2.95158 ms, enqueue 2.89188 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87664 ms - Host latency: 2.92867 ms (end to end 2.94229 ms, enqueue 2.88347 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.88386 ms - Host latency: 2.93678 ms (end to end 2.9521 ms, enqueue 2.892 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.88201 ms - Host latency: 2.93541 ms (end to end 2.95077 ms, enqueue 2.88823 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.88074 ms - Host latency: 2.93599 ms (end to end 2.95114 ms, enqueue 2.88395 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86906 ms - Host latency: 2.92098 ms (end to end 2.93571 ms, enqueue 2.87891 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86257 ms - Host latency: 2.91634 ms (end to end 2.93233 ms, enqueue 2.87405 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.8753 ms - Host latency: 2.92903 ms (end to end 2.94368 ms, enqueue 2.8842 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87535 ms - Host latency: 2.92858 ms (end to end 2.9438 ms, enqueue 2.88605 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.8743 ms - Host latency: 2.92425 ms (end to end 2.93974 ms, enqueue 2.88536 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87623 ms - Host latency: 2.93015 ms (end to end 2.9448 ms, enqueue 2.88596 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86167 ms - Host latency: 2.91508 ms (end to end 2.93047 ms, enqueue 2.87277 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86331 ms - Host latency: 2.91505 ms (end to end 2.93033 ms, enqueue 2.87405 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87124 ms - Host latency: 2.92146 ms (end to end 2.93777 ms, enqueue 2.87889 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86897 ms - Host latency: 2.92216 ms (end to end 2.93703 ms, enqueue 2.87814 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.85886 ms - Host latency: 2.91129 ms (end to end 2.92525 ms, enqueue 2.8681 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86021 ms - Host latency: 2.91103 ms (end to end 2.92701 ms, enqueue 2.87221 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86671 ms - Host latency: 2.92081 ms (end to end 2.9364 ms, enqueue 2.87509 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86963 ms - Host latency: 2.92402 ms (end to end 2.9398 ms, enqueue 2.87736 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87422 ms - Host latency: 2.92809 ms (end to end 2.94235 ms, enqueue 2.88018 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87314 ms - Host latency: 2.92356 ms (end to end 2.93806 ms, enqueue 2.87974 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86818 ms - Host latency: 2.91882 ms (end to end 2.93505 ms, enqueue 2.87737 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87406 ms - Host latency: 2.92498 ms (end to end 2.93818 ms, enqueue 2.87883 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.99041 ms - Host latency: 3.0423 ms (end to end 3.05704 ms, enqueue 2.99994 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86624 ms - Host latency: 2.92496 ms (end to end 2.94188 ms, enqueue 2.87323 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86818 ms - Host latency: 2.91893 ms (end to end 2.93274 ms, enqueue 2.87891 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87087 ms - Host latency: 2.92025 ms (end to end 2.93451 ms, enqueue 2.8801 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87739 ms - Host latency: 2.92729 ms (end to end 2.94149 ms, enqueue 2.88628 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.8672 ms - Host latency: 2.91755 ms (end to end 2.935 ms, enqueue 2.87864 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87386 ms - Host latency: 2.92482 ms (end to end 2.93882 ms, enqueue 2.88333 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87162 ms - Host latency: 2.92146 ms (end to end 2.93804 ms, enqueue 2.87993 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.88469 ms - Host latency: 2.93608 ms (end to end 2.95085 ms, enqueue 2.89587 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87877 ms - Host latency: 2.92921 ms (end to end 2.94298 ms, enqueue 2.88904 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87865 ms - Host latency: 2.92964 ms (end to end 2.94485 ms, enqueue 2.88955 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87012 ms - Host latency: 2.9204 ms (end to end 2.93402 ms, enqueue 2.87827 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86733 ms - Host latency: 2.91963 ms (end to end 2.93466 ms, enqueue 2.87832 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.96436 ms - Host latency: 3.0144 ms (end to end 3.0304 ms, enqueue 2.97598 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87559 ms - Host latency: 2.92654 ms (end to end 2.96493 ms, enqueue 2.91058 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87662 ms - Host latency: 2.92961 ms (end to end 2.94355 ms, enqueue 2.88556 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87839 ms - Host latency: 2.92887 ms (end to end 2.94398 ms, enqueue 2.88724 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87152 ms - Host latency: 2.92605 ms (end to end 2.94247 ms, enqueue 2.88365 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87498 ms - Host latency: 2.92495 ms (end to end 2.93981 ms, enqueue 2.88436 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87714 ms - Host latency: 2.93107 ms (end to end 2.94634 ms, enqueue 2.88809 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87427 ms - Host latency: 2.92928 ms (end to end 2.9444 ms, enqueue 2.8859 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.88052 ms - Host latency: 2.93173 ms (end to end 2.94508 ms, enqueue 2.88893 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.8733 ms - Host latency: 2.92466 ms (end to end 2.9397 ms, enqueue 2.88549 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87352 ms - Host latency: 2.92832 ms (end to end 2.94552 ms, enqueue 2.88508 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86709 ms - Host latency: 2.91539 ms (end to end 2.93005 ms, enqueue 2.87876 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86886 ms - Host latency: 2.91692 ms (end to end 2.93147 ms, enqueue 2.87902 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87177 ms - Host latency: 2.92211 ms (end to end 2.93849 ms, enqueue 2.88257 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86998 ms - Host latency: 2.91853 ms (end to end 2.93334 ms, enqueue 2.88157 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87015 ms - Host latency: 2.91853 ms (end to end 2.9333 ms, enqueue 2.88047 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87059 ms - Host latency: 2.92092 ms (end to end 2.93524 ms, enqueue 2.88181 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87208 ms - Host latency: 2.91987 ms (end to end 2.93378 ms, enqueue 2.88191 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86626 ms - Host latency: 2.91492 ms (end to end 2.9304 ms, enqueue 2.87751 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86707 ms - Host latency: 2.91658 ms (end to end 2.93188 ms, enqueue 2.87898 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87639 ms - Host latency: 2.928 ms (end to end 2.94312 ms, enqueue 2.8876 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86929 ms - Host latency: 2.91711 ms (end to end 2.9325 ms, enqueue 2.8801 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87039 ms - Host latency: 2.92178 ms (end to end 2.93647 ms, enqueue 2.88135 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87476 ms - Host latency: 2.92625 ms (end to end 2.94167 ms, enqueue 2.88665 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87253 ms - Host latency: 2.92349 ms (end to end 2.9385 ms, enqueue 2.88357 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87112 ms - Host latency: 2.92231 ms (end to end 2.93691 ms, enqueue 2.88225 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.88452 ms - Host latency: 2.93413 ms (end to end 2.94866 ms, enqueue 2.89519 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87258 ms - Host latency: 2.92263 ms (end to end 2.93982 ms, enqueue 2.88201 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.9959 ms - Host latency: 3.04426 ms (end to end 3.05876 ms, enqueue 3.00317 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87415 ms - Host latency: 2.9271 ms (end to end 2.94492 ms, enqueue 2.88628 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87842 ms - Host latency: 2.92656 ms (end to end 2.94126 ms, enqueue 2.88779 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87244 ms - Host latency: 2.92393 ms (end to end 2.94026 ms, enqueue 2.88367 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87302 ms - Host latency: 2.9219 ms (end to end 2.93623 ms, enqueue 2.88186 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87683 ms - Host latency: 2.92463 ms (end to end 2.93816 ms, enqueue 2.88547 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87722 ms - Host latency: 2.92544 ms (end to end 2.93816 ms, enqueue 2.88506 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87556 ms - Host latency: 2.92698 ms (end to end 2.94089 ms, enqueue 2.88438 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87683 ms - Host latency: 2.92473 ms (end to end 2.94026 ms, enqueue 2.88669 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.88091 ms - Host latency: 2.92988 ms (end to end 2.9448 ms, enqueue 2.89199 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87346 ms - Host latency: 2.92466 ms (end to end 2.94004 ms, enqueue 2.88594 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87559 ms - Host latency: 2.92576 ms (end to end 2.94092 ms, enqueue 2.88794 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87805 ms - Host latency: 2.92769 ms (end to end 2.94255 ms, enqueue 2.89023 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.96291 ms - Host latency: 3.01245 ms (end to end 3.02725 ms, enqueue 2.97056 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.90667 ms - Host latency: 2.9563 ms (end to end 2.97073 ms, enqueue 2.91853 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87461 ms - Host latency: 2.92527 ms (end to end 2.94167 ms, enqueue 2.88652 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87402 ms - Host latency: 2.92427 ms (end to end 2.9406 ms, enqueue 2.88584 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86921 ms - Host latency: 2.92048 ms (end to end 2.93467 ms, enqueue 2.8793 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86726 ms - Host latency: 2.91731 ms (end to end 2.93313 ms, enqueue 2.87971 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87092 ms - Host latency: 2.92073 ms (end to end 2.9353 ms, enqueue 2.88262 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87869 ms - Host latency: 2.93357 ms (end to end 2.94824 ms, enqueue 2.88792 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87361 ms - Host latency: 2.9229 ms (end to end 2.93911 ms, enqueue 2.88342 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87913 ms - Host latency: 2.92954 ms (end to end 2.94287 ms, enqueue 2.89099 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86743 ms - Host latency: 2.92021 ms (end to end 2.93486 ms, enqueue 2.87883 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.87275 ms - Host latency: 2.92324 ms (end to end 2.93967 ms, enqueue 2.88296 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86472 ms - Host latency: 2.9176 ms (end to end 2.93079 ms, enqueue 2.87544 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86938 ms - Host latency: 2.92146 ms (end to end 2.93823 ms, enqueue 2.88232 ms) [05/17/2022-15:34:13] [I] Average on 10 runs - GPU latency: 2.86467 ms - Host latency: 2.91768 ms (end to end 2.93359 ms, enqueue 2.87688 ms) [05/17/2022-15:34:13] [I] [05/17/2022-15:34:13] [I] === Performance summary === [05/17/2022-15:34:13] [I] Throughput: 338.702 qps [05/17/2022-15:34:13] [I] Latency: min = 2.84549 ms, max = 3.83386 ms, mean = 2.93167 ms, median = 2.92358 ms, percentile(99%) = 3.13593 ms [05/17/2022-15:34:13] [I] End-to-End Host Latency: min = 2.8606 ms, max = 3.85193 ms, mean = 2.94698 ms, median = 2.93848 ms, percentile(99%) = 3.14789 ms [05/17/2022-15:34:13] [I] Enqueue Time: min = 2.84351 ms, max = 3.78979 ms, mean = 2.8898 ms, median = 2.88196 ms, percentile(99%) = 3.13696 ms [05/17/2022-15:34:13] [I] H2D Latency: min = 0.0189819 ms, max = 0.0706787 ms, mean = 0.0270949 ms, median = 0.026123 ms, percentile(99%) = 0.0455933 ms [05/17/2022-15:34:13] [I] GPU Compute Time: min = 2.80261 ms, max = 3.77844 ms, mean = 2.88004 ms, median = 2.87207 ms, percentile(99%) = 3.09708 ms [05/17/2022-15:34:13] [I] D2H Latency: min = 0.0119629 ms, max = 0.0681152 ms, mean = 0.0245367 ms, median = 0.0234375 ms, percentile(99%) = 0.0473633 ms [05/17/2022-15:34:13] [I] Total Host Walltime: 3.00559 s [05/17/2022-15:34:13] [I] Total GPU Compute Time: 2.93188 s [05/17/2022-15:34:13] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized. [05/17/2022-15:34:13] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput. [05/17/2022-15:34:13] [I] Explanations of the performance metrics are printed in the verbose logs. [05/17/2022-15:34:13] [I] [05/17/2022-15:34:13] [I] [05/17/2022-15:34:13] [I] === Profile (1085 iterations ) === [05/17/2022-15:34:13] [I] Layer Time (ms) Avg. Time (ms) Time % [05/17/2022-15:34:13] [I] Reformatting CopyNode for Input Tensor 0 to Conv_0 + Relu_1 75.51 0.0696 2.4 [05/17/2022-15:34:13] [I] Conv_0 + Relu_1 95.42 0.0879 3.1 [05/17/2022-15:34:13] [I] MaxPool_2 28.78 0.0265 0.9 [05/17/2022-15:34:13] [I] Conv_3 + Relu_4 21.02 0.0194 0.7 [05/17/2022-15:34:13] [I] Conv_5 + Relu_6 53.09 0.0489 1.7 [05/17/2022-15:34:13] [I] Conv_7 54.90 0.0506 1.8 [05/17/2022-15:34:13] [I] Conv_8 + Add_9 + Relu_10 67.45 0.0622 2.2 [05/17/2022-15:34:13] [I] Conv_11 + Relu_12 44.14 0.0407 1.4 [05/17/2022-15:34:13] [I] Conv_13 + Relu_14 52.92 0.0488 1.7 [05/17/2022-15:34:13] [I] Conv_15 + Add_16 + Relu_17 69.26 0.0638 2.2 [05/17/2022-15:34:13] [I] Conv_18 + Relu_19 43.98 0.0405 1.4 [05/17/2022-15:34:13] [I] Conv_20 + Relu_21 53.21 0.0490 1.7 [05/17/2022-15:34:13] [I] Conv_22 + Add_23 + Relu_24 70.16 0.0647 2.3 [05/17/2022-15:34:13] [I] Conv_25 + Relu_26 69.37 0.0639 2.2 [05/17/2022-15:34:13] [I] Conv_27 + Relu_28 67.43 0.0622 2.2 [05/17/2022-15:34:13] [I] Conv_29 50.22 0.0463 1.6 [05/17/2022-15:34:13] [I] Conv_30 + Add_31 + Relu_32 76.31 0.0703 2.5 [05/17/2022-15:34:13] [I] Conv_33 + Relu_34 45.51 0.0419 1.5 [05/17/2022-15:34:13] [I] Conv_35 + Relu_36 67.64 0.0623 2.2 [05/17/2022-15:34:13] [I] Conv_37 + Add_38 + Relu_39 60.27 0.0556 1.9 [05/17/2022-15:34:13] [I] Conv_40 + Relu_41 46.87 0.0432 1.5 [05/17/2022-15:34:13] [I] Conv_42 + Relu_43 67.57 0.0623 2.2 [05/17/2022-15:34:13] [I] Conv_44 + Add_45 + Relu_46 59.79 0.0551 1.9 [05/17/2022-15:34:13] [I] Conv_47 + Relu_48 45.58 0.0420 1.5 [05/17/2022-15:34:13] [I] Conv_49 + Relu_50 67.56 0.0623 2.2 [05/17/2022-15:34:13] [I] Conv_51 + Add_52 + Relu_53 59.72 0.0550 1.9 [05/17/2022-15:34:13] [I] Conv_54 + Relu_55 51.55 0.0475 1.7 [05/17/2022-15:34:13] [I] Conv_56 + Relu_57 65.30 0.0602 2.1 [05/17/2022-15:34:13] [I] Conv_58 41.61 0.0383 1.3 [05/17/2022-15:34:13] [I] Conv_59 + Add_60 + Relu_61 68.16 0.0628 2.2 [05/17/2022-15:34:13] [I] Conv_62 + Relu_63 36.53 0.0337 1.2 [05/17/2022-15:34:13] [I] Conv_64 + Relu_65 63.98 0.0590 2.1 [05/17/2022-15:34:13] [I] Conv_66 + Add_67 + Relu_68 47.05 0.0434 1.5 [05/17/2022-15:34:13] [I] Conv_69 + Relu_70 36.79 0.0339 1.2 [05/17/2022-15:34:13] [I] Conv_71 + Relu_72 64.03 0.0590 2.1 [05/17/2022-15:34:13] [I] Conv_73 + Add_74 + Relu_75 47.14 0.0434 1.5 [05/17/2022-15:34:13] [I] Conv_76 + Relu_77 36.59 0.0337 1.2 [05/17/2022-15:34:13] [I] Conv_78 + Relu_79 64.15 0.0591 2.1 [05/17/2022-15:34:13] [I] Conv_80 + Add_81 + Relu_82 47.05 0.0434 1.5 [05/17/2022-15:34:13] [I] Conv_83 + Relu_84 36.77 0.0339 1.2 [05/17/2022-15:34:13] [I] Conv_85 + Relu_86 65.80 0.0606 2.1 [05/17/2022-15:34:13] [I] Conv_87 + Add_88 + Relu_89 48.10 0.0443 1.6 [05/17/2022-15:34:13] [I] Conv_90 + Relu_91 36.59 0.0337 1.2 [05/17/2022-15:34:13] [I] Conv_92 + Relu_93 64.14 0.0591 2.1 [05/17/2022-15:34:13] [I] Conv_94 + Add_95 + Relu_96 47.04 0.0434 1.5 [05/17/2022-15:34:13] [I] Conv_97 + Relu_98 59.96 0.0553 1.9 [05/17/2022-15:34:13] [I] Conv_99 + Relu_100 87.81 0.0809 2.8 [05/17/2022-15:34:13] [I] Conv_101 46.70 0.0430 1.5 [05/17/2022-15:34:13] [I] Conv_102 + Add_103 + Relu_104 70.82 0.0653 2.3 [05/17/2022-15:34:13] [I] Conv_105 + Relu_106 46.67 0.0430 1.5 [05/17/2022-15:34:13] [I] Conv_107 + Relu_108 81.05 0.0747 2.6 [05/17/2022-15:34:13] [I] Conv_109 + Add_110 + Relu_111 44.21 0.0407 1.4 [05/17/2022-15:34:13] [I] Conv_112 + Relu_113 45.17 0.0416 1.5 [05/17/2022-15:34:13] [I] Conv_114 + Relu_115 79.17 0.0730 2.6 [05/17/2022-15:34:13] [I] Conv_116 + Add_117 + Relu_118 39.06 0.0360 1.3 [05/17/2022-15:34:13] [I] Reformatting CopyNode for Output Tensor 0 to Conv_116 + Add_117 + Relu_118 12.10 0.0112 0.4 [05/17/2022-15:34:13] [I] Total 3094.78 2.8523 100.0 [05/17/2022-15:34:13] [I] &&&& PASSED TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --loadEngine=resnet50_sim_mod_GPU_fp16.trt --fp16 --dumpProfile