ONNX to TensorRT engine conversion fails with "layers must be distinct"

epi1 · March 3, 2023, 2:36pm

I want to convert my onnx model using
trtexec --onnx=my_model.onnx which fails with [E] [TRT] Repeated layer name: /model.24/Split_1 (layers must have distinct names).
I am on a
Jetson TX2
Jetpack 4.4
r32.4.3 with
TensorRT7.1.3

When inspecting my model using netron I can not find layers with the same name. This onnx models converts fine to TensorRT on my desktop computer.
My onnx model was exported from a pytorch implementation of the yolov5 architecture. I uploaded my model
my_model.onnx (410.8 KB)

LOGS:

Blockquote
&&&& RUNNING TensorRT.trtexec # trtexec --onnx=my_model.onnx
[03/03/2023-15:21:28] [I] === Model Options ===
[03/03/2023-15:21:28] [I] Format: ONNX
[03/03/2023-15:21:28] [I] Model: my_model.onnx
[03/03/2023-15:21:28] [I] Output:
[03/03/2023-15:21:28] [I] === Build Options ===
[03/03/2023-15:21:28] [I] Max batch: 1
[03/03/2023-15:21:28] [I] Workspace: 16 MB
[03/03/2023-15:21:28] [I] minTiming: 1
[03/03/2023-15:21:28] [I] avgTiming: 8
[03/03/2023-15:21:28] [I] Precision: FP32
[03/03/2023-15:21:28] [I] Calibration:
[03/03/2023-15:21:28] [I] Safe mode: Disabled
[03/03/2023-15:21:28] [I] Save engine:
[03/03/2023-15:21:28] [I] Load engine:
[03/03/2023-15:21:28] [I] Builder Cache: Enabled
[03/03/2023-15:21:28] [I] NVTX verbosity: 0
[03/03/2023-15:21:28] [I] Inputs format: fp32:CHW
[03/03/2023-15:21:28] [I] Outputs format: fp32:CHW
[03/03/2023-15:21:28] [I] Input build shapes: model
[03/03/2023-15:21:28] [I] Input calibration shapes: model
[03/03/2023-15:21:28] [I] === System Options ===
[03/03/2023-15:21:28] [I] Device: 0
[03/03/2023-15:21:28] [I] DLACore:
[03/03/2023-15:21:28] [I] Plugins:
[03/03/2023-15:21:28] [I] === Inference Options ===
[03/03/2023-15:21:28] [I] Batch: 1
[03/03/2023-15:21:28] [I] Input inference shapes: model
[03/03/2023-15:21:28] [I] Iterations: 10
[03/03/2023-15:21:28] [I] Duration: 3s (+ 200ms warm up)
[03/03/2023-15:21:28] [I] Sleep time: 0ms
[03/03/2023-15:21:28] [I] Streams: 1
[03/03/2023-15:21:28] [I] ExposeDMA: Disabled
[03/03/2023-15:21:28] [I] Spin-wait: Disabled
[03/03/2023-15:21:28] [I] Multithreading: Disabled
[03/03/2023-15:21:28] [I] CUDA Graph: Disabled
[03/03/2023-15:21:28] [I] Skip inference: Disabled
[03/03/2023-15:21:28] [I] Inputs:
[03/03/2023-15:21:28] [I] === Reporting Options ===
[03/03/2023-15:21:28] [I] Verbose: Disabled
[03/03/2023-15:21:28] [I] Averages: 10 inferences
[03/03/2023-15:21:28] [I] Percentile: 99
[03/03/2023-15:21:28] [I] Dump output: Disabled
[03/03/2023-15:21:28] [I] Profile: Disabled
[03/03/2023-15:21:28] [I] Export timing to JSON file:
[03/03/2023-15:21:28] [I] Export output to JSON file:
[03/03/2023-15:21:28] [I] Export profile to JSON file:
[03/03/2023-15:21:28] [I]
Input filename: my_model.onnx
ONNX IR version: 0.0.7
Opset version: 12
Producer name: pytorch
Producer version: 1.13.0
Domain:
Model version: 0
Doc string:
[03/03/2023-15:21:29] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/03/2023-15:21:29] [E] [TRT] Repeated layer name: /model.24/Split_1 (layers must have distinct names)
[03/03/2023-15:21:29] [E] [TRT] Network validation failed.
[03/03/2023-15:21:29] [E] Engine creation failed
[03/03/2023-15:21:29] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # trtexec --onnx=my_model.onnx

Is there a bug with this TensorRT version in L4T?
Any idea how this can be fixed?

(I tried the same using docker run -it --rm --net=host -v ~/models/:/app nvcr.io/nvidia/l4t-base:r32.4.3 with exactly the same result as on the TX2)

This post describes a very similar problem just on another platform. Could this work for me as well? Unfortunately I have no idea how to make such a TensorRT downgrade on our custom Yocto platform.

SivaRamaKrishnaNV · March 6, 2023, 5:02am

Dear @epi1,
Could you test with Jetpack 4.6.3?

epi1 · March 6, 2023, 10:01am

Unfortunately, I cannot upgrade to a newer Jetpack version - I’m stuck with Jetpack 4.4

SivaRamaKrishnaNV · March 14, 2023, 2:26pm

Dear @epi1 ,
Did you test building open source onnx parser?

SivaRamaKrishnaNV · March 15, 2023, 2:54am

Dear @epi1,
I could run your model on Jetpack 4.6.3


nvidia@tegra-ubuntu:/usr/src/tensorrt/bin$ ./trtexec --onnx=/home/nvidia/my_model.onnx
&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # ./trtexec --onnx=/home/nvidia/my_model.onnx
[03/15/2023-02:25:19] [I] === Model Options ===
[03/15/2023-02:25:19] [I] Format: ONNX
[03/15/2023-02:25:19] [I] Model: /home/nvidia/my_model.onnx
[03/15/2023-02:25:19] [I] Output:
[03/15/2023-02:25:19] [I] === Build Options ===
[03/15/2023-02:25:19] [I] Max batch: explicit batch
[03/15/2023-02:25:19] [I] Workspace: 16 MiB
[03/15/2023-02:25:19] [I] minTiming: 1
[03/15/2023-02:25:19] [I] avgTiming: 8
[03/15/2023-02:25:19] [I] Precision: FP32
[03/15/2023-02:25:19] [I] Calibration:
[03/15/2023-02:25:19] [I] Refit: Disabled
[03/15/2023-02:25:19] [I] Sparsity: Disabled
[03/15/2023-02:25:19] [I] Safe mode: Disabled
[03/15/2023-02:25:19] [I] DirectIO mode: Disabled
[03/15/2023-02:25:19] [I] Restricted mode: Disabled
[03/15/2023-02:25:19] [I] Save engine:
[03/15/2023-02:25:19] [I] Load engine:
[03/15/2023-02:25:19] [I] Profiling verbosity: 0
[03/15/2023-02:25:19] [I] Tactic sources: Using default tactic sources
[03/15/2023-02:25:19] [I] timingCacheMode: local
[03/15/2023-02:25:19] [I] timingCacheFile:
[03/15/2023-02:25:19] [I] Input(s)s format: fp32:CHW
[03/15/2023-02:25:19] [I] Output(s)s format: fp32:CHW
[03/15/2023-02:25:19] [I] Input build shapes: model
[03/15/2023-02:25:19] [I] Input calibration shapes: model
[03/15/2023-02:25:19] [I] === System Options ===
[03/15/2023-02:25:19] [I] Device: 0
[03/15/2023-02:25:19] [I] DLACore:
[03/15/2023-02:25:19] [I] Plugins:
[03/15/2023-02:25:19] [I] === Inference Options ===
[03/15/2023-02:25:19] [I] Batch: Explicit
[03/15/2023-02:25:19] [I] Input inference shapes: model
[03/15/2023-02:25:19] [I] Iterations: 10
[03/15/2023-02:25:19] [I] Duration: 3s (+ 200ms warm up)
[03/15/2023-02:25:19] [I] Sleep time: 0ms
[03/15/2023-02:25:19] [I] Idle time: 0ms
[03/15/2023-02:25:19] [I] Streams: 1
[03/15/2023-02:25:19] [I] ExposeDMA: Disabled
[03/15/2023-02:25:19] [I] Data transfers: Enabled
[03/15/2023-02:25:19] [I] Spin-wait: Disabled
[03/15/2023-02:25:19] [I] Multithreading: Disabled
[03/15/2023-02:25:19] [I] CUDA Graph: Disabled
[03/15/2023-02:25:19] [I] Separate profiling: Disabled
[03/15/2023-02:25:19] [I] Time Deserialize: Disabled
[03/15/2023-02:25:19] [I] Time Refit: Disabled
[03/15/2023-02:25:19] [I] Skip inference: Disabled
[03/15/2023-02:25:19] [I] Inputs:
[03/15/2023-02:25:19] [I] === Reporting Options ===
[03/15/2023-02:25:19] [I] Verbose: Disabled
[03/15/2023-02:25:19] [I] Averages: 10 inferences
[03/15/2023-02:25:19] [I] Percentile: 99
[03/15/2023-02:25:19] [I] Dump refittable layers:Disabled
[03/15/2023-02:25:19] [I] Dump output: Disabled
[03/15/2023-02:25:19] [I] Profile: Disabled
[03/15/2023-02:25:19] [I] Export timing to JSON file:
[03/15/2023-02:25:19] [I] Export output to JSON file:
[03/15/2023-02:25:19] [I] Export profile to JSON file:
[03/15/2023-02:25:19] [I]
[03/15/2023-02:25:19] [I] === Device Information ===
[03/15/2023-02:25:19] [I] Selected Device: NVIDIA Tegra X2
[03/15/2023-02:25:19] [I] Compute Capability: 6.2
[03/15/2023-02:25:19] [I] SMs: 2
[03/15/2023-02:25:19] [I] Compute Clock Rate: 1.3 GHz
[03/15/2023-02:25:19] [I] Device Global Memory: 7858 MiB
[03/15/2023-02:25:19] [I] Shared Memory per SM: 64 KiB
[03/15/2023-02:25:19] [I] Memory Bus Width: 128 bits (ECC disabled)
[03/15/2023-02:25:19] [I] Memory Clock Rate: 1.3 GHz
[03/15/2023-02:25:19] [I]
[03/15/2023-02:25:19] [I] TensorRT version: 8.2.1
[03/15/2023-02:25:21] [I] [TRT] [MemUsageChange] Init CUDA: CPU +266, GPU +0, now: CPU 285, GPU 6235 (MiB)
[03/15/2023-02:25:21] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 285 MiB, GPU 6235 MiB
[03/15/2023-02:25:22] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 314 MiB, GPU 6265 MiB
[03/15/2023-02:25:22] [I] Start parsing network model
[03/15/2023-02:25:22] [I] [TRT] ----------------------------------------------------------------
[03/15/2023-02:25:22] [I] [TRT] Input filename:   /home/nvidia/my_model.onnx
[03/15/2023-02:25:22] [I] [TRT] ONNX IR version:  0.0.7
[03/15/2023-02:25:22] [I] [TRT] Opset version:    12
[03/15/2023-02:25:22] [I] [TRT] Producer name:    pytorch
[03/15/2023-02:25:22] [I] [TRT] Producer version: 1.13.0
[03/15/2023-02:25:22] [I] [TRT] Domain:
[03/15/2023-02:25:22] [I] [TRT] Model version:    0
[03/15/2023-02:25:22] [I] [TRT] Doc string:
[03/15/2023-02:25:22] [I] [TRT] ----------------------------------------------------------------
[03/15/2023-02:25:22] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/15/2023-02:25:22] [I] Finish parsing network model
[03/15/2023-02:25:22] [I] [TRT] ---------- Layers Running on DLA ----------
[03/15/2023-02:25:22] [I] [TRT] ---------- Layers Running on GPU ----------
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.0/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.0/act/Sigmoid), /model.0/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.1/act/Sigmoid), /model.1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.2/cv1/conv/Conv || /model.2/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.2/cv1/act/Sigmoid), /model.2/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.2/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.2/m/m.0/cv1/act/Sigmoid), /model.2/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.2/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(PWN(/model.2/m/m.0/cv2/act/Sigmoid), /model.2/m/m.0/cv2/act/Mul), /model.2/m/m.0/Add)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.2/cv2/act/Sigmoid), /model.2/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.2/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.2/cv3/act/Sigmoid), /model.2/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.3/act/Sigmoid), /model.3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.4/cv1/conv/Conv || /model.4/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.4/cv1/act/Sigmoid), /model.4/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.4/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.4/m/m.0/cv1/act/Sigmoid), /model.4/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.4/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(PWN(/model.4/m/m.0/cv2/act/Sigmoid), /model.4/m/m.0/cv2/act/Mul), /model.4/m/m.0/Add)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.4/cv2/act/Sigmoid), /model.4/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.4/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.4/cv3/act/Sigmoid), /model.4/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.5/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.5/act/Sigmoid), /model.5/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.6/cv1/conv/Conv || /model.6/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.6/cv1/act/Sigmoid), /model.6/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.6/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.6/m/m.0/cv1/act/Sigmoid), /model.6/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.6/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(PWN(/model.6/m/m.0/cv2/act/Sigmoid), /model.6/m/m.0/cv2/act/Mul), /model.6/m/m.0/Add)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.6/cv2/act/Sigmoid), /model.6/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.6/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.6/cv3/act/Sigmoid), /model.6/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.7/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.7/act/Sigmoid), /model.7/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.8/cv1/conv/Conv || /model.8/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.8/cv1/act/Sigmoid), /model.8/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.8/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.8/m/m.0/cv1/act/Sigmoid), /model.8/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.8/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(PWN(/model.8/m/m.0/cv2/act/Sigmoid), /model.8/m/m.0/cv2/act/Mul), /model.8/m/m.0/Add)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.8/cv2/act/Sigmoid), /model.8/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.8/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.8/cv3/act/Sigmoid), /model.8/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.9/cv1/act/Sigmoid), /model.9/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/m/MaxPool
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/m_1/MaxPool
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/m_2/MaxPool
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/cv1/act/Mul_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/m/MaxPool_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/m_1/MaxPool_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/m_2/MaxPool_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.9/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.9/cv2/act/Sigmoid), /model.9/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.10/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.10/act/Sigmoid), /model.10/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.11/Resize
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.11/Resize_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.13/cv1/conv/Conv || /model.13/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.13/cv1/act/Sigmoid), /model.13/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.13/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.13/m/m.0/cv1/act/Sigmoid), /model.13/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.13/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.13/m/m.0/cv2/act/Sigmoid), /model.13/m/m.0/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.13/cv2/act/Sigmoid), /model.13/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.13/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.13/cv3/act/Sigmoid), /model.13/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.14/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.14/act/Sigmoid), /model.14/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.15/Resize
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.15/Resize_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.17/cv1/conv/Conv || /model.17/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.17/cv1/act/Sigmoid), /model.17/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.17/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.17/m/m.0/cv1/act/Sigmoid), /model.17/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.17/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.17/m/m.0/cv2/act/Sigmoid), /model.17/m/m.0/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.17/cv2/act/Sigmoid), /model.17/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.17/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.17/cv3/act/Sigmoid), /model.17/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.18/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.18/act/Sigmoid), /model.18/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.18/act/Mul_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.14/act/Mul_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.20/cv1/conv/Conv || /model.20/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.20/cv1/act/Sigmoid), /model.20/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.20/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.20/m/m.0/cv1/act/Sigmoid), /model.20/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.20/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.20/m/m.0/cv2/act/Sigmoid), /model.20/m/m.0/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.20/cv2/act/Sigmoid), /model.20/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.20/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.20/cv3/act/Sigmoid), /model.20/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.21/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.21/act/Sigmoid), /model.21/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.10/act/Mul_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.23/cv1/conv/Conv || /model.23/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.23/cv1/act/Sigmoid), /model.23/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.23/m/m.0/cv1/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.23/m/m.0/cv1/act/Sigmoid), /model.23/m/m.0/cv1/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.23/m/m.0/cv2/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.23/m/m.0/cv2/act/Sigmoid), /model.23/m/m.0/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.23/cv2/act/Sigmoid), /model.23/cv2/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.23/cv3/conv/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(PWN(/model.23/cv3/act/Sigmoid), /model.23/cv3/act/Mul)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/m.0/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape + /model.24/Transpose
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(/model.24/Sigmoid)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_0
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_1
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_1_output_0 + (Unnamed Layer* 183) [Shuffle] + /model.24/Mul
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_2_output_0 + /model.24/Add
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_3_output_0 + (Unnamed Layer* 188) [Shuffle] + /model.24/Mul_1
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(/model.24/Constant_5_output_0 + (Unnamed Layer* 194) [Shuffle], PWN(/model.24/Constant_4_output_0 + (Unnamed Layer* 191) [Shuffle] + /model.24/Mul_2, /model.24/Pow))
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_6_output_0 + /model.24/Mul_3
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Mul_1_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Mul_3_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_output_2 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_1
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/m.1/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_2 + /model.24/Transpose_1
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(/model.24/Sigmoid_1)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_1_2
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_1_3
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_1_4
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_9_output_0 + (Unnamed Layer* 208) [Shuffle] + /model.24/Mul_4
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_10_output_0 + /model.24/Add_1
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_11_output_0 + (Unnamed Layer* 213) [Shuffle] + /model.24/Mul_5
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(/model.24/Constant_13_output_0 + (Unnamed Layer* 219) [Shuffle], PWN(/model.24/Constant_12_output_0 + (Unnamed Layer* 216) [Shuffle] + /model.24/Mul_6, /model.24/Pow_1))
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_14_output_0 + /model.24/Mul_7
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Mul_5_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Mul_7_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_1_output_2 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_3
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/m.2/Conv
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_4 + /model.24/Transpose_2
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(/model.24/Sigmoid_2)
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_2
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_2_5
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_2_6
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_17_output_0 + (Unnamed Layer* 233) [Shuffle] + /model.24/Mul_8
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_18_output_0 + /model.24/Add_2
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_19_output_0 + (Unnamed Layer* 238) [Shuffle] + /model.24/Mul_9
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] PWN(/model.24/Constant_21_output_0 + (Unnamed Layer* 244) [Shuffle], PWN(/model.24/Constant_20_output_0 + (Unnamed Layer* 241) [Shuffle] + /model.24/Mul_10, /model.24/Pow_2))
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Constant_22_output_0 + /model.24/Mul_11
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Mul_9_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Mul_11_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Split_2_output_2 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_5
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_1_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_3_output_0 copy
[03/15/2023-02:25:22] [I] [TRT] [GpuLayer] /model.24/Reshape_5_output_0 copy
[03/15/2023-02:25:23] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +167, GPU +294, now: CPU 483, GPU 6562 (MiB)
[03/15/2023-02:25:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +250, GPU +310, now: CPU 733, GPU 6872 (MiB)
[03/15/2023-02:25:25] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[03/15/2023-02:27:18] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[03/15/2023-02:27:18] [I] [TRT] Total Host Persistent Memory: 105664
[03/15/2023-02:27:18] [I] [TRT] Total Device Persistent Memory: 545792
[03/15/2023-02:27:18] [I] [TRT] Total Scratch Memory: 0
[03/15/2023-02:27:18] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 18 MiB
[03/15/2023-02:27:18] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 53.3691ms to assign 9 blocks to 136 nodes requiring 483330 bytes.
[03/15/2023-02:27:18] [I] [TRT] Total Activation Memory: 483330
[03/15/2023-02:27:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +0, now: CPU 989, GPU 6996 (MiB)
[03/15/2023-02:27:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 989, GPU 6996 (MiB)
[03/15/2023-02:27:18] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB)
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 982, GPU 6996 (MiB)
[03/15/2023-02:27:19] [I] [TRT] Loaded engine size: 1 MiB
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 988, GPU 6996 (MiB)
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 988, GPU 6996 (MiB)
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[03/15/2023-02:27:19] [I] Engine built in 119.178 sec.
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 957, GPU 6996 (MiB)
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 957, GPU 6996 (MiB)
[03/15/2023-02:27:19] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1, now: CPU 0, GPU 1 (MiB)
[03/15/2023-02:27:19] [I] Using random values for input images
[03/15/2023-02:27:19] [I] Created input binding for images with dimensions 1x3x160x160
[03/15/2023-02:27:19] [I] Using random values for output output0
[03/15/2023-02:27:19] [I] Created output binding for output0 with dimensions 1x525x6
[03/15/2023-02:27:19] [I] Starting inference
[03/15/2023-02:27:22] [I] Warmup completed 49 queries over 200 ms
[03/15/2023-02:27:22] [I] Timing trace has 868 queries over 3.00742 s
[03/15/2023-02:27:22] [I]
[03/15/2023-02:27:22] [I] === Trace details ===
[03/15/2023-02:27:22] [I] Trace averages of 10 runs:
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37973 ms - Host latency: 3.44337 ms (end to end 3.45868 ms, enqueue 3.37324 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36009 ms - Host latency: 3.42313 ms (end to end 3.4379 ms, enqueue 3.35294 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35954 ms - Host latency: 3.4242 ms (end to end 3.43946 ms, enqueue 3.35337 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36441 ms - Host latency: 3.42755 ms (end to end 3.44239 ms, enqueue 3.35734 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35561 ms - Host latency: 3.41825 ms (end to end 3.43305 ms, enqueue 3.34828 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35867 ms - Host latency: 3.42149 ms (end to end 3.43646 ms, enqueue 3.35217 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3594 ms - Host latency: 3.42205 ms (end to end 3.43697 ms, enqueue 3.35123 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35564 ms - Host latency: 3.41833 ms (end to end 3.4334 ms, enqueue 3.34905 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36611 ms - Host latency: 3.42921 ms (end to end 3.44414 ms, enqueue 3.35906 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37684 ms - Host latency: 3.44072 ms (end to end 3.45593 ms, enqueue 3.37038 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37711 ms - Host latency: 3.44067 ms (end to end 3.45662 ms, enqueue 3.36978 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37596 ms - Host latency: 3.43929 ms (end to end 3.45435 ms, enqueue 3.36937 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3633 ms - Host latency: 3.42613 ms (end to end 3.44116 ms, enqueue 3.35617 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35262 ms - Host latency: 3.4149 ms (end to end 3.42965 ms, enqueue 3.346 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35351 ms - Host latency: 3.41588 ms (end to end 3.43211 ms, enqueue 3.34682 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35709 ms - Host latency: 3.41956 ms (end to end 3.43444 ms, enqueue 3.35045 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36039 ms - Host latency: 3.42296 ms (end to end 3.43767 ms, enqueue 3.35323 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36906 ms - Host latency: 3.43171 ms (end to end 3.44664 ms, enqueue 3.36246 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36072 ms - Host latency: 3.42542 ms (end to end 3.44056 ms, enqueue 3.35414 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36259 ms - Host latency: 3.42549 ms (end to end 3.44034 ms, enqueue 3.35581 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3578 ms - Host latency: 3.42057 ms (end to end 3.4355 ms, enqueue 3.35126 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.38356 ms - Host latency: 3.44731 ms (end to end 3.46257 ms, enqueue 3.37688 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.41713 ms - Host latency: 3.48584 ms (end to end 3.50579 ms, enqueue 3.41525 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34083 ms - Host latency: 3.40789 ms (end to end 3.42914 ms, enqueue 3.34422 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35865 ms - Host latency: 3.42581 ms (end to end 3.44727 ms, enqueue 3.36218 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.38176 ms - Host latency: 3.45028 ms (end to end 3.47119 ms, enqueue 3.38424 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3583 ms - Host latency: 3.42749 ms (end to end 3.44839 ms, enqueue 3.36051 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35183 ms - Host latency: 3.42036 ms (end to end 3.44188 ms, enqueue 3.35532 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.355 ms - Host latency: 3.42231 ms (end to end 3.44413 ms, enqueue 3.35844 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34822 ms - Host latency: 3.41572 ms (end to end 3.43705 ms, enqueue 3.35184 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3463 ms - Host latency: 3.4151 ms (end to end 3.43523 ms, enqueue 3.34885 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3444 ms - Host latency: 3.41129 ms (end to end 3.43234 ms, enqueue 3.34785 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35765 ms - Host latency: 3.42441 ms (end to end 3.4458 ms, enqueue 3.36113 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34667 ms - Host latency: 3.41409 ms (end to end 3.43542 ms, enqueue 3.35022 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36234 ms - Host latency: 3.43009 ms (end to end 3.45153 ms, enqueue 3.36414 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34717 ms - Host latency: 3.41427 ms (end to end 3.43562 ms, enqueue 3.35096 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34348 ms - Host latency: 3.41075 ms (end to end 3.43204 ms, enqueue 3.34717 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.341 ms - Host latency: 3.40814 ms (end to end 3.42936 ms, enqueue 3.34465 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34864 ms - Host latency: 3.41658 ms (end to end 3.43878 ms, enqueue 3.35217 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35906 ms - Host latency: 3.42684 ms (end to end 3.44821 ms, enqueue 3.36271 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35138 ms - Host latency: 3.41849 ms (end to end 3.43915 ms, enqueue 3.35476 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34567 ms - Host latency: 3.41276 ms (end to end 3.43381 ms, enqueue 3.34886 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34243 ms - Host latency: 3.40969 ms (end to end 3.43102 ms, enqueue 3.34606 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34042 ms - Host latency: 3.40753 ms (end to end 3.42848 ms, enqueue 3.34392 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.33904 ms - Host latency: 3.40852 ms (end to end 3.42989 ms, enqueue 3.34269 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3588 ms - Host latency: 3.42646 ms (end to end 3.44789 ms, enqueue 3.36224 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35725 ms - Host latency: 3.42472 ms (end to end 3.44568 ms, enqueue 3.36042 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3537 ms - Host latency: 3.42214 ms (end to end 3.44257 ms, enqueue 3.3561 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3525 ms - Host latency: 3.41919 ms (end to end 3.4402 ms, enqueue 3.35583 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3387 ms - Host latency: 3.40544 ms (end to end 3.42803 ms, enqueue 3.3422 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34189 ms - Host latency: 3.40869 ms (end to end 3.42937 ms, enqueue 3.34491 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.39476 ms - Host latency: 3.46364 ms (end to end 3.48589 ms, enqueue 3.39766 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.345 ms - Host latency: 3.41212 ms (end to end 3.43331 ms, enqueue 3.34812 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.33256 ms - Host latency: 3.39952 ms (end to end 3.42039 ms, enqueue 3.33608 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37795 ms - Host latency: 3.44673 ms (end to end 3.46829 ms, enqueue 3.38084 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37588 ms - Host latency: 3.44421 ms (end to end 3.46567 ms, enqueue 3.37917 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35256 ms - Host latency: 3.42034 ms (end to end 3.44099 ms, enqueue 3.35574 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37051 ms - Host latency: 3.43982 ms (end to end 3.46125 ms, enqueue 3.37292 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34854 ms - Host latency: 3.41553 ms (end to end 3.4364 ms, enqueue 3.352 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34839 ms - Host latency: 3.41526 ms (end to end 3.43625 ms, enqueue 3.35137 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35237 ms - Host latency: 3.41931 ms (end to end 3.44055 ms, enqueue 3.35562 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35586 ms - Host latency: 3.42485 ms (end to end 3.44602 ms, enqueue 3.35916 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34526 ms - Host latency: 3.41223 ms (end to end 3.43298 ms, enqueue 3.34863 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35427 ms - Host latency: 3.42124 ms (end to end 3.44185 ms, enqueue 3.35698 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34683 ms - Host latency: 3.41375 ms (end to end 3.43489 ms, enqueue 3.3499 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.33723 ms - Host latency: 3.4041 ms (end to end 3.4252 ms, enqueue 3.34092 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34954 ms - Host latency: 3.41663 ms (end to end 3.43787 ms, enqueue 3.35273 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.42983 ms - Host latency: 3.498 ms (end to end 3.51914 ms, enqueue 3.43201 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3801 ms - Host latency: 3.44836 ms (end to end 3.46956 ms, enqueue 3.38306 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37227 ms - Host latency: 3.43989 ms (end to end 3.46147 ms, enqueue 3.37527 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34736 ms - Host latency: 3.41448 ms (end to end 3.43557 ms, enqueue 3.35083 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34255 ms - Host latency: 3.40999 ms (end to end 3.43108 ms, enqueue 3.34597 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34573 ms - Host latency: 3.4136 ms (end to end 3.43408 ms, enqueue 3.34851 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34307 ms - Host latency: 3.40974 ms (end to end 3.43057 ms, enqueue 3.34639 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.34873 ms - Host latency: 3.41541 ms (end to end 3.43628 ms, enqueue 3.35232 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.33437 ms - Host latency: 3.40232 ms (end to end 3.42332 ms, enqueue 3.33765 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35198 ms - Host latency: 3.41943 ms (end to end 3.4405 ms, enqueue 3.35525 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.3438 ms - Host latency: 3.41135 ms (end to end 3.43198 ms, enqueue 3.34687 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.346 ms - Host latency: 3.41392 ms (end to end 3.43503 ms, enqueue 3.34949 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35764 ms - Host latency: 3.42578 ms (end to end 3.44717 ms, enqueue 3.36123 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.39685 ms - Host latency: 3.46648 ms (end to end 3.48865 ms, enqueue 3.40032 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36519 ms - Host latency: 3.43274 ms (end to end 3.45403 ms, enqueue 3.36885 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.35559 ms - Host latency: 3.42263 ms (end to end 3.44382 ms, enqueue 3.35767 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37708 ms - Host latency: 3.44573 ms (end to end 3.46748 ms, enqueue 3.37974 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.37043 ms - Host latency: 3.43958 ms (end to end 3.46052 ms, enqueue 3.37292 ms)
[03/15/2023-02:27:22] [I] Average on 10 runs - GPU latency: 3.36682 ms - Host latency: 3.43455 ms (end to end 3.45488 ms, enqueue 3.36936 ms)
[03/15/2023-02:27:22] [I]
[03/15/2023-02:27:22] [I] === Performance summary ===
[03/15/2023-02:27:22] [I] Throughput: 288.619 qps
[03/15/2023-02:27:22] [I] Latency: min = 3.37036 ms, max = 3.8103 ms, mean = 3.42449 ms, median = 3.41943 ms, percentile(99%) = 3.55396 ms
[03/15/2023-02:27:22] [I] End-to-End Host Latency: min = 3.39087 ms, max = 3.83447 ms, mean = 3.44411 ms, median = 3.43848 ms, percentile(99%) = 3.57617 ms
[03/15/2023-02:27:22] [I] Enqueue Time: min = 3.30713 ms, max = 3.74487 ms, mean = 3.35857 ms, median = 3.35376 ms, percentile(99%) = 3.48315 ms
[03/15/2023-02:27:22] [I] H2D Latency: min = 0.038147 ms, max = 0.0578613 ms, mean = 0.0421989 ms, median = 0.0427246 ms, percentile(99%) = 0.0466309 ms
[03/15/2023-02:27:22] [I] GPU Compute Time: min = 3.30371 ms, max = 3.74072 ms, mean = 3.35798 ms, median = 3.35327 ms, percentile(99%) = 3.4812 ms
[03/15/2023-02:27:22] [I] D2H Latency: min = 0.0227051 ms, max = 0.0405273 ms, mean = 0.0243122 ms, median = 0.0241699 ms, percentile(99%) = 0.0296631 ms
[03/15/2023-02:27:22] [I] Total Host Walltime: 3.00742 s
[03/15/2023-02:27:22] [I] Total GPU Compute Time: 2.91473 s
[03/15/2023-02:27:22] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[03/15/2023-02:27:22] [W]   If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[03/15/2023-02:27:22] [I] Explanations of the performance metrics are printed in the verbose logs.
[03/15/2023-02:27:22] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8201] # ./trtexec --onnx=/home/nvidia/my_model.onnx

Polzovatel1186 · March 16, 2023, 12:03pm

Hello, epi1!
I faced the same problem on Jetson Xavier NX with JetPack 4.4.
I also tried to launch YOLOv5 nano neural net, but it faied with the same issue.
Please, could you me, have you managed to fix this trouble?
What were your steps to solve the problem?
Thank you!

epi1 · March 16, 2023, 12:40pm

I have not solved the problem.
I can only confirm that on Jetpack 4.6.1 and the accompanying TensoRT 8.2 this problem is gone.
I was also not able to downgrade to TensorRT7.1.0 (which may help as well)

@SivaRamaKrishnaNV which opensource onnx parser? I tried loading the onnx model using the onnx library and then connecting it with tensorrt using its python API → Unsuccessful. Should that approach work in general though?

SivaRamaKrishnaNV · March 20, 2023, 8:13am

Dear @epi1,
I am glad to hear that the issue is resolved with Jetpack 4.6.1 itself. In general, we recommend to use latest version so that you get access to new features and bug fixes.
I was referring to GitHub - onnx/onnx-tensorrt: ONNX-TensorRT: TensorRT backend for ONNX repo.

system · April 3, 2023, 8:14am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1407	July 12, 2022
ERORR with ONNX2TRT : Unknown embedded device detected Jetson Xavier NX onnx	18	4555	April 27, 2022
Erorr with onnx to trt Jetson Xavier NX tensorrt	8	1238	March 30, 2022
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	714	April 30, 2024
I am trying to convert the ONNX SSD mobilnet v3 model into TensorRT Engine. I am getting the below error Jetson TX2 tensorrt , tensorflow	24	3700	February 17, 2022
Keras->Onnx->TensorRT Jetson AGX Orin tensorrt	4	121	September 25, 2024
Error Code 10: Internal Error (Could not find any implementation for node TensorRT cudnn	19	2502	September 29, 2024
Running a pytorch network converted to ONNX with TensorRT on the TX2 Jetson TX2	24	8868	October 18, 2021
Onnx to trt conversion TensorRT tensorrt	8	796	April 21, 2020
ONNX -> TRT Error Code 1 and 2: Cask (isConsistent) and Internal Error (Assertion enginePtr != nullptr failed.) Jetson AGX Xavier tensorrt	6	1292	July 26, 2022

ONNX to TensorRT engine conversion fails with "layers must be distinct"

Related topics