Dear @epi1,
I could run your model on Jetpack 4.6.3
nvidia@tegra-ubuntu:/usr/src/tensorrt/bin$ ./trtexec --onnx=/home/nvidia/my_model.onnx
&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # ./trtexec --onnx=/home/nvidia/my_model.onnx
[03/15/2023-02:25:19] [I] === Model Options ===
[03/15/2023-02:25:19] [I] Format: ONNX
[03/15/2023-02:25:19] [I] Model: /home/nvidia/my_model.onnx
[03/15/2023-02:25:19] [I] Output:
[03/15/2023-02:25:19] [I] === Build Options ===
[03/15/2023-02:25:19] [I] Max batch: explicit batch
[03/15/2023-02:25:19] [I] Workspace: 16 MiB
[03/15/2023-02:25:19] [I] minTiming: 1
[03/15/2023-02:25:19] [I] avgTiming: 8
[03/15/2023-02:25:19] [I] Precision: FP32
[03/15/2023-02:25:19] [I] Calibration:
[03/15/2023-02:25:19] [I] Refit: Disabled
[03/15/2023-02:25:19] [I] Sparsity: Disabled
[03/15/2023-02:25:19] [I] Safe mode: Disabled
[03/15/2023-02:25:19] [I] DirectIO mode: Disabled
[03/15/2023-02:25:19] [I] Restricted mode: Disabled
[03/15/2023-02:25:19] [I] Save engine:
[03/15/2023-02:25:19] [I] Load engine:
[03/15/2023-02:25:19] [I] Profiling verbosity: 0
[03/15/2023-02:25:19] [I] Tactic sources: Using default tactic sources
[03/15/2023-02:25:19] [I] timingCacheMode: local
[03/15/2023-02:25:19] [I] timingCacheFile:
[03/15/2023-02:25:19] [I] Input(s)s format: fp32:CHW
[03/15/2023-02:25:19] [I] Output(s)s format: fp32:CHW
[03/15/2023-02:25:19] [I] Input build shapes: model
[03/15/2023-02:25:19] [I] Input calibration shapes: model
[03/15/2023-02:25:19] [I] === System Options ===
[03/15/2023-02:25:19] [I] Device: 0
[03/15/2023-02:25:19] [I] DLACore:
[03/15/2023-02:25:19] [I] Plugins:
[03/15/2023-02:25:19] [I] === Inference Options ===
[03/15/2023-02:25:19] [I] Batch: Explicit
[03/15/2023-02:25:19] [I] Input inference shapes: model
[03/15/2023-02:25:19] [I] Iterations: 10
[03/15/2023-02:25:19] [I] Duration: 3s (+ 200ms warm up)
[03/15/2023-02:25:19] [I] Sleep time: 0ms
[03/15/2023-02:25:19] [I] Idle time: 0ms
[03/15/2023-02:25:19] [I] Streams: 1
[03/15/2023-02:25:19] [I] ExposeDMA: Disabled
[03/15/2023-02:25:19] [I] Data transfers: Enabled
[03/15/2023-02:25:19] [I] Spin-wait: Disabled
[03/15/2023-02:25:19] [I] Multithreading: Disabled
[03/15/2023-02:25:19] [I] CUDA Graph: Disabled
[03/15/2023-02:25:19] [I] Separate profiling: Disabled
[03/15/2023-02:25:19] [I] Time Deserialize: Disabled
[03/15/2023-02:25:19] [I] Time Refit: Disabled
[03/15/2023-02:25:19] [I] Skip inference: Disabled
[03/15/2023-02:25:19] [I] Inputs:
[03/15/2023-02:25:19] [I] === Reporting Options ===
[03/15/2023-02:25:19] [I] Verbose: Disabled
[03/15/2023-02:25:19] [I] Averages: 10 inferences
[03/15/2023-02:25:19] [I] Percentile: 99
[03/15/2023-02:25:19] [I] Dump refittable layers:Disabled
[03/15/2023-02:25:19] [I] Dump output: Disabled
[03/15/2023-02:25:19] [I] Profile: Disabled
[03/15/2023-02:25:19] [I] Export timing to JSON file:
[03/15/2023-02:25:19] [I] Export output to JSON file:
[03/15/2023-02:25:19] [I] Export profile to JSON file:
[03/15/2023-02:25:19] [I]
[03/15/2023-02:25:19] [I] === Device Information ===
[03/15/2023-02:25:19] [I] Selected Device: NVIDIA Tegra X2
[03/15/2023-02:25:19] [I] Compute Capability: 6.2
[03/15/2023-02:25:19] [I] SMs: 2
[03/15/2023-02:25:19] [I] Compute Clock Rate: 1.3 GHz
[03/15/2023-02:25:19] [I] Device Global Memory: 7858 MiB
[03/15/2023-02:25:19] [I] Shared Memory per SM: 64 KiB
[03/15/2023-02:25:19] [I] Memory Bus Width: 128 bits (ECC disabled)
[03/15/2023-02:25:19] [I] Memory Clock Rate: 1.3 GHz
[03/15/2023-02:25:19] [I]
[03/15/2023-02:25:19] [I] TensorRT version: 8.2.1
[03/15/2023-02:25:21] [I] [TRT] [MemUsageChange] Init CUDA: CPU +266, GPU +0, now: CPU 285, GPU 6235 (MiB)
[03/15/2023-02:25:21] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 285 MiB, GPU 6235 MiB
[03/15/2023-02:25:22] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 314 MiB, GPU 6265 MiB
[03/15/2023-02:25:22] [I] Start parsing network model
[03/15/2023-02:25:22] [I] [TRT] ----------------------------------------------------------------
[03/15/2023-02:25:22] [I] [TRT] Input filename: /home/nvidia/my_model.onnx
[03/15/2023-02:25:22] [I] [TRT] ONNX IR version: 0.0.7
[03/15/2023-02:25:22] [I] [TRT] Opset version: 12
[03/15/2023-02:25:22] [I] [TRT] Producer name: pytorch
[03/15/2023-02:25:22] [I] [TRT] Producer version: 1.13.0
[03/15/2023-02:25:22] [I] [TRT] Domain:
[03/15/2023-02:25:22] [I] [TRT] Model version: 0
[03/15/2023-02:25:22] [I] [TRT] Doc string:
[03/15/2023-02:25:22] [I] [TRT] ----------------------------------------------------------------
[03/15/2023-02:25:22] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/15/2023-02:25:22] [I] Finish parsing network model
[03/15/2023-02:25:22] [I] [TRT] ---------- Layers Running on DLA ----------
[03/15/2023-02:25:22] [I] [TRT] ---------- Layers Running on GPU ----------
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.0/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.0/act/Sigmoid), /model.0/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.1/act/Sigmoid), /model.1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.2/cv1/conv/Conv || /model.2/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.2/cv1/act/Sigmoid), /model.2/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.2/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.2/m/m.0/cv1/act/Sigmoid), /model.2/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.2/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(PWN(/model.2/m/m.0/cv2/act/Sigmoid), /model.2/m/m.0/cv2/act/Mul), /model.2/m/m.0/Add)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.2/cv2/act/Sigmoid), /model.2/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.2/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.2/cv3/act/Sigmoid), /model.2/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.3/act/Sigmoid), /model.3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.4/cv1/conv/Conv || /model.4/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.4/cv1/act/Sigmoid), /model.4/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.4/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.4/m/m.0/cv1/act/Sigmoid), /model.4/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.4/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(PWN(/model.4/m/m.0/cv2/act/Sigmoid), /model.4/m/m.0/cv2/act/Mul), /model.4/m/m.0/Add)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.4/cv2/act/Sigmoid), /model.4/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.4/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.4/cv3/act/Sigmoid), /model.4/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.5/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.5/act/Sigmoid), /model.5/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.6/cv1/conv/Conv || /model.6/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.6/cv1/act/Sigmoid), /model.6/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.6/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.6/m/m.0/cv1/act/Sigmoid), /model.6/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.6/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(PWN(/model.6/m/m.0/cv2/act/Sigmoid), /model.6/m/m.0/cv2/act/Mul), /model.6/m/m.0/Add)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.6/cv2/act/Sigmoid), /model.6/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.6/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.6/cv3/act/Sigmoid), /model.6/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.7/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.7/act/Sigmoid), /model.7/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.8/cv1/conv/Conv || /model.8/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.8/cv1/act/Sigmoid), /model.8/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.8/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.8/m/m.0/cv1/act/Sigmoid), /model.8/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.8/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(PWN(/model.8/m/m.0/cv2/act/Sigmoid), /model.8/m/m.0/cv2/act/Mul), /model.8/m/m.0/Add)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.8/cv2/act/Sigmoid), /model.8/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.8/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.8/cv3/act/Sigmoid), /model.8/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.9/cv1/act/Sigmoid), /model.9/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/m/MaxPool
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/m_1/MaxPool
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/m_2/MaxPool
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/cv1/act/Mul_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/m/MaxPool_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/m_1/MaxPool_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/m_2/MaxPool_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.9/cv2/act/Sigmoid), /model.9/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.10/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.10/act/Sigmoid), /model.10/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.11/Resize
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.11/Resize_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.13/cv1/conv/Conv || /model.13/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.13/cv1/act/Sigmoid), /model.13/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.13/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.13/m/m.0/cv1/act/Sigmoid), /model.13/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.13/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.13/m/m.0/cv2/act/Sigmoid), /model.13/m/m.0/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.13/cv2/act/Sigmoid), /model.13/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.13/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.13/cv3/act/Sigmoid), /model.13/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.14/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.14/act/Sigmoid), /model.14/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.15/Resize
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.15/Resize_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.17/cv1/conv/Conv || /model.17/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.17/cv1/act/Sigmoid), /model.17/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.17/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.17/m/m.0/cv1/act/Sigmoid), /model.17/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.17/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.17/m/m.0/cv2/act/Sigmoid), /model.17/m/m.0/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.17/cv2/act/Sigmoid), /model.17/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.17/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.17/cv3/act/Sigmoid), /model.17/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.18/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.18/act/Sigmoid), /model.18/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.18/act/Mul_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.14/act/Mul_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.20/cv1/conv/Conv || /model.20/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.20/cv1/act/Sigmoid), /model.20/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.20/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.20/m/m.0/cv1/act/Sigmoid), /model.20/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.20/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.20/m/m.0/cv2/act/Sigmoid), /model.20/m/m.0/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.20/cv2/act/Sigmoid), /model.20/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.20/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.20/cv3/act/Sigmoid), /model.20/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.21/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.21/act/Sigmoid), /model.21/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.10/act/Mul_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.23/cv1/conv/Conv || /model.23/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.23/cv1/act/Sigmoid), /model.23/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.23/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.23/m/m.0/cv1/act/Sigmoid), /model.23/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.23/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.23/m/m.0/cv2/act/Sigmoid), /model.23/m/m.0/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.23/cv2/act/Sigmoid), /model.23/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.23/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.23/cv3/act/Sigmoid), /model.23/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/m.0/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape + /model.24/Transpose
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(/model.24/Sigmoid)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_0
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_1
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_1_output_0 + (Unnamed Layer* 183) [Shuffle] + /model.24/Mul
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_2_output_0 + /model.24/Add
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_3_output_0 + (Unnamed Layer* 188) [Shuffle] + /model.24/Mul_1
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(/model.24/Constant_5_output_0 + (Unnamed Layer* 194) [Shuffle], PWN(/model.24/Constant_4_output_0 + (Unnamed Layer* 191) [Shuffle] + /model.24/Mul_2, /model.24/Pow))
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_6_output_0 + /model.24/Mul_3
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Mul_1_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Mul_3_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_output_2 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_1
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/m.1/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_2 + /model.24/Transpose_1
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(/model.24/Sigmoid_1)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_1_2
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_1_3
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_1_4
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_9_output_0 + (Unnamed Layer* 208) [Shuffle] + /model.24/Mul_4
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_10_output_0 + /model.24/Add_1
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_11_output_0 + (Unnamed Layer* 213) [Shuffle] + /model.24/Mul_5
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(/model.24/Constant_13_output_0 + (Unnamed Layer* 219) [Shuffle], PWN(/model.24/Constant_12_output_0 + (Unnamed Layer* 216) [Shuffle] + /model.24/Mul_6, /model.24/Pow_1))
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_14_output_0 + /model.24/Mul_7
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Mul_5_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Mul_7_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_1_output_2 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_3
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/m.2/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_4 + /model.24/Transpose_2
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(/model.24/Sigmoid_2)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_2
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_2_5
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_2_6
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_17_output_0 + (Unnamed Layer* 233) [Shuffle] + /model.24/Mul_8
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_18_output_0 + /model.24/Add_2
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_19_output_0 + (Unnamed Layer* 238) [Shuffle] + /model.24/Mul_9
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(/model.24/Constant_21_output_0 + (Unnamed Layer* 244) [Shuffle], PWN(/model.24/Constant_20_output_0 + (Unnamed Layer* 241) [Shuffle] + /model.24/Mul_10, /model.24/Pow_2))
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_22_output_0 + /model.24/Mul_11
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Mul_9_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Mul_11_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_2_output_2 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_5
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_1_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_3_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_5_output_0 copy
[03/15/2023-02:25:23] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +167, GPU +294, now: CPU 483, GPU 6562 (MiB)
[03/15/2023-02:25:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +250, GPU +310, now: CPU 733, GPU 6872 (MiB)
[03/15/2023-02:25:25] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[03/15/2023-02:27:18] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[03/15/2023-02:27:18] [I] [TRT] Total Host Persistent Memory: 105664
[03/15/2023-02:27:18] [I] [TRT] Total Device Persistent Memory: 545792
[03/15/2023-02:27:18] [I] [TRT] Total Scratch Memory: 0
[03/15/2023-02:27:18] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 18 MiB
[03/15/2023-02:27:18] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 53.3691ms to assign 9 blocks to 136 nodes requiring 483330 bytes.
[03/15/2023-02:27:18] [I] [TRT] Total Activation Memory: 483330
[03/15/2023-02:27:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +0, now: CPU 989, GPU 6996 (MiB)
[03/15/2023-02:27:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 989, GPU 6996 (MiB)
[03/15/2023-02:27:18] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB)
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 982, GPU 6996 (MiB)
[03/15/2023-02:27:19] [I] [TRT] Loaded engine size: 1 MiB
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 988, GPU 6996 (MiB)
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 988, GPU 6996 (MiB)
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[03/15/2023-02:27:19] [I] Engine built in 119.178 sec.
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 957, GPU 6996 (MiB)
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 957, GPU 6996 (MiB)
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1, now: CPU 0, GPU 1 (MiB)
[03/15/2023-02:27:19] [I] Using random values for input images
[03/15/2023-02:27:19] [I] Created input binding for images with dimensions 1x3x160x160
[03/15/2023-02:27:19] [I] Using random values for output output0
[03/15/2023-02:27:19] [I] Created output binding for output0 with dimensions 1x525x6
[03/15/2023-02:27:19] [I] Starting inference
[03/15/2023-02:27:22] [I] Warmup completed 49 queries over 200 ms
[03/15/2023-02:27:22] [I] Timing trace has 868 queries over 3.00742 s
[03/15/2023-02:27:22] [I]
[03/15/2023-02:27:22] [I] === Trace details ===
[03/15/2023-02:27:22] [I] Trace averages of 10 runs:
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37973 ms - Host latency: 3.44337 ms (end to end 3.45868 ms, enqueue 3.37324 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36009 ms - Host latency: 3.42313 ms (end to end 3.4379 ms, enqueue 3.35294 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35954 ms - Host latency: 3.4242 ms (end to end 3.43946 ms, enqueue 3.35337 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36441 ms - Host latency: 3.42755 ms (end to end 3.44239 ms, enqueue 3.35734 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35561 ms - Host latency: 3.41825 ms (end to end 3.43305 ms, enqueue 3.34828 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35867 ms - Host latency: 3.42149 ms (end to end 3.43646 ms, enqueue 3.35217 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3594 ms - Host latency: 3.42205 ms (end to end 3.43697 ms, enqueue 3.35123 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35564 ms - Host latency: 3.41833 ms (end to end 3.4334 ms, enqueue 3.34905 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36611 ms - Host latency: 3.42921 ms (end to end 3.44414 ms, enqueue 3.35906 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37684 ms - Host latency: 3.44072 ms (end to end 3.45593 ms, enqueue 3.37038 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37711 ms - Host latency: 3.44067 ms (end to end 3.45662 ms, enqueue 3.36978 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37596 ms - Host latency: 3.43929 ms (end to end 3.45435 ms, enqueue 3.36937 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3633 ms - Host latency: 3.42613 ms (end to end 3.44116 ms, enqueue 3.35617 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35262 ms - Host latency: 3.4149 ms (end to end 3.42965 ms, enqueue 3.346 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35351 ms - Host latency: 3.41588 ms (end to end 3.43211 ms, enqueue 3.34682 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35709 ms - Host latency: 3.41956 ms (end to end 3.43444 ms, enqueue 3.35045 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36039 ms - Host latency: 3.42296 ms (end to end 3.43767 ms, enqueue 3.35323 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36906 ms - Host latency: 3.43171 ms (end to end 3.44664 ms, enqueue 3.36246 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36072 ms - Host latency: 3.42542 ms (end to end 3.44056 ms, enqueue 3.35414 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36259 ms - Host latency: 3.42549 ms (end to end 3.44034 ms, enqueue 3.35581 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3578 ms - Host latency: 3.42057 ms (end to end 3.4355 ms, enqueue 3.35126 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.38356 ms - Host latency: 3.44731 ms (end to end 3.46257 ms, enqueue 3.37688 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.41713 ms - Host latency: 3.48584 ms (end to end 3.50579 ms, enqueue 3.41525 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34083 ms - Host latency: 3.40789 ms (end to end 3.42914 ms, enqueue 3.34422 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35865 ms - Host latency: 3.42581 ms (end to end 3.44727 ms, enqueue 3.36218 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.38176 ms - Host latency: 3.45028 ms (end to end 3.47119 ms, enqueue 3.38424 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3583 ms - Host latency: 3.42749 ms (end to end 3.44839 ms, enqueue 3.36051 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35183 ms - Host latency: 3.42036 ms (end to end 3.44188 ms, enqueue 3.35532 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.355 ms - Host latency: 3.42231 ms (end to end 3.44413 ms, enqueue 3.35844 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34822 ms - Host latency: 3.41572 ms (end to end 3.43705 ms, enqueue 3.35184 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3463 ms - Host latency: 3.4151 ms (end to end 3.43523 ms, enqueue 3.34885 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3444 ms - Host latency: 3.41129 ms (end to end 3.43234 ms, enqueue 3.34785 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35765 ms - Host latency: 3.42441 ms (end to end 3.4458 ms, enqueue 3.36113 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34667 ms - Host latency: 3.41409 ms (end to end 3.43542 ms, enqueue 3.35022 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36234 ms - Host latency: 3.43009 ms (end to end 3.45153 ms, enqueue 3.36414 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34717 ms - Host latency: 3.41427 ms (end to end 3.43562 ms, enqueue 3.35096 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34348 ms - Host latency: 3.41075 ms (end to end 3.43204 ms, enqueue 3.34717 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.341 ms - Host latency: 3.40814 ms (end to end 3.42936 ms, enqueue 3.34465 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34864 ms - Host latency: 3.41658 ms (end to end 3.43878 ms, enqueue 3.35217 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35906 ms - Host latency: 3.42684 ms (end to end 3.44821 ms, enqueue 3.36271 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35138 ms - Host latency: 3.41849 ms (end to end 3.43915 ms, enqueue 3.35476 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34567 ms - Host latency: 3.41276 ms (end to end 3.43381 ms, enqueue 3.34886 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34243 ms - Host latency: 3.40969 ms (end to end 3.43102 ms, enqueue 3.34606 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34042 ms - Host latency: 3.40753 ms (end to end 3.42848 ms, enqueue 3.34392 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.33904 ms - Host latency: 3.40852 ms (end to end 3.42989 ms, enqueue 3.34269 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3588 ms - Host latency: 3.42646 ms (end to end 3.44789 ms, enqueue 3.36224 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35725 ms - Host latency: 3.42472 ms (end to end 3.44568 ms, enqueue 3.36042 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3537 ms - Host latency: 3.42214 ms (end to end 3.44257 ms, enqueue 3.3561 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3525 ms - Host latency: 3.41919 ms (end to end 3.4402 ms, enqueue 3.35583 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3387 ms - Host latency: 3.40544 ms (end to end 3.42803 ms, enqueue 3.3422 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34189 ms - Host latency: 3.40869 ms (end to end 3.42937 ms, enqueue 3.34491 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.39476 ms - Host latency: 3.46364 ms (end to end 3.48589 ms, enqueue 3.39766 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.345 ms - Host latency: 3.41212 ms (end to end 3.43331 ms, enqueue 3.34812 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.33256 ms - Host latency: 3.39952 ms (end to end 3.42039 ms, enqueue 3.33608 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37795 ms - Host latency: 3.44673 ms (end to end 3.46829 ms, enqueue 3.38084 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37588 ms - Host latency: 3.44421 ms (end to end 3.46567 ms, enqueue 3.37917 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35256 ms - Host latency: 3.42034 ms (end to end 3.44099 ms, enqueue 3.35574 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37051 ms - Host latency: 3.43982 ms (end to end 3.46125 ms, enqueue 3.37292 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34854 ms - Host latency: 3.41553 ms (end to end 3.4364 ms, enqueue 3.352 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34839 ms - Host latency: 3.41526 ms (end to end 3.43625 ms, enqueue 3.35137 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35237 ms - Host latency: 3.41931 ms (end to end 3.44055 ms, enqueue 3.35562 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35586 ms - Host latency: 3.42485 ms (end to end 3.44602 ms, enqueue 3.35916 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34526 ms - Host latency: 3.41223 ms (end to end 3.43298 ms, enqueue 3.34863 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35427 ms - Host latency: 3.42124 ms (end to end 3.44185 ms, enqueue 3.35698 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34683 ms - Host latency: 3.41375 ms (end to end 3.43489 ms, enqueue 3.3499 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.33723 ms - Host latency: 3.4041 ms (end to end 3.4252 ms, enqueue 3.34092 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34954 ms - Host latency: 3.41663 ms (end to end 3.43787 ms, enqueue 3.35273 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.42983 ms - Host latency: 3.498 ms (end to end 3.51914 ms, enqueue 3.43201 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3801 ms - Host latency: 3.44836 ms (end to end 3.46956 ms, enqueue 3.38306 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37227 ms - Host latency: 3.43989 ms (end to end 3.46147 ms, enqueue 3.37527 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34736 ms - Host latency: 3.41448 ms (end to end 3.43557 ms, enqueue 3.35083 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34255 ms - Host latency: 3.40999 ms (end to end 3.43108 ms, enqueue 3.34597 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34573 ms - Host latency: 3.4136 ms (end to end 3.43408 ms, enqueue 3.34851 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34307 ms - Host latency: 3.40974 ms (end to end 3.43057 ms, enqueue 3.34639 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34873 ms - Host latency: 3.41541 ms (end to end 3.43628 ms, enqueue 3.35232 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.33437 ms - Host latency: 3.40232 ms (end to end 3.42332 ms, enqueue 3.33765 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35198 ms - Host latency: 3.41943 ms (end to end 3.4405 ms, enqueue 3.35525 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3438 ms - Host latency: 3.41135 ms (end to end 3.43198 ms, enqueue 3.34687 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.346 ms - Host latency: 3.41392 ms (end to end 3.43503 ms, enqueue 3.34949 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35764 ms - Host latency: 3.42578 ms (end to end 3.44717 ms, enqueue 3.36123 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.39685 ms - Host latency: 3.46648 ms (end to end 3.48865 ms, enqueue 3.40032 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36519 ms - Host latency: 3.43274 ms (end to end 3.45403 ms, enqueue 3.36885 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35559 ms - Host latency: 3.42263 ms (end to end 3.44382 ms, enqueue 3.35767 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37708 ms - Host latency: 3.44573 ms (end to end 3.46748 ms, enqueue 3.37974 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37043 ms - Host latency: 3.43958 ms (end to end 3.46052 ms, enqueue 3.37292 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36682 ms - Host latency: 3.43455 ms (end to end 3.45488 ms, enqueue 3.36936 ms)
[03/15/2023-02:27:22] [I]
[03/15/2023-02:27:22] [I] === Performance summary ===
[03/15/2023-02:27:22] [I] Throughput: 288.619 qps
[03/15/2023-02:27:22] [I] Latency: min = 3.37036 ms, max = 3.8103 ms, mean = 3.42449 ms, median = 3.41943 ms, percentile(99%) = 3.55396 ms
[03/15/2023-02:27:22] [I] End-to-End Host Latency: min = 3.39087 ms, max = 3.83447 ms, mean = 3.44411 ms, median = 3.43848 ms, percentile(99%) = 3.57617 ms
[03/15/2023-02:27:22] [I] Enqueue Time: min = 3.30713 ms, max = 3.74487 ms, mean = 3.35857 ms, median = 3.35376 ms, percentile(99%) = 3.48315 ms
[03/15/2023-02:27:22] [I] H2D Latency: min = 0.038147 ms, max = 0.0578613 ms, mean = 0.0421989 ms, median = 0.0427246 ms, percentile(99%) = 0.0466309 ms
[03/15/2023-02:27:22] [I] GPU Compute Time: min = 3.30371 ms, max = 3.74072 ms, mean = 3.35798 ms, median = 3.35327 ms, percentile(99%) = 3.4812 ms
[03/15/2023-02:27:22] [I] D2H Latency: min = 0.0227051 ms, max = 0.0405273 ms, mean = 0.0243122 ms, median = 0.0241699 ms, percentile(99%) = 0.0296631 ms
[03/15/2023-02:27:22] [I] Total Host Walltime: 3.00742 s
[03/15/2023-02:27:22] [I] Total GPU Compute Time: 2.91473 s
[03/15/2023-02:27:22] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[03/15/2023-02:27:22] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[03/15/2023-02:27:22] [I] Explanations of the performance metrics are printed in the verbose logs.
[03/15/2023-02:27:22] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8201] # ./trtexec --onnx=/home/nvidia/my_model.onnx