Dear @0xdeadbeef,
Just to update you, I could run the model using trtexec on target. I will verify the issue on engine loading issue on DW side and update you.
nvidia@tegra-ubuntu:~/siva$ /usr/src/tensorrt/bin/trtexec --onnx=/home/nvidia/siva/yolov3.onnx --saveEngine=/home/nvidia/siva/yolov3.bin
&&&& RUNNING TensorRT.trtexec [TensorRT v8510] # /usr/src/tensorrt/bin/trtexec --onnx=/home/nvidia/siva/yolov3.onnx --saveEngine=/home/nvidia/siva/yolov3.bin
[08/11/2023-04:54:05] [I] === Model Options ===
[08/11/2023-04:54:05] [I] Format: ONNX
[08/11/2023-04:54:05] [I] Model: /home/nvidia/siva/yolov3.onnx
[08/11/2023-04:54:05] [I] Output:
[08/11/2023-04:54:05] [I] === Build Options ===
[08/11/2023-04:54:05] [I] Max batch: explicit batch
[08/11/2023-04:54:05] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[08/11/2023-04:54:05] [I] minTiming: 1
[08/11/2023-04:54:05] [I] avgTiming: 8
[08/11/2023-04:54:05] [I] Precision: FP32
[08/11/2023-04:54:05] [I] LayerPrecisions:
[08/11/2023-04:54:05] [I] Layer Device Types:
[08/11/2023-04:54:05] [I] Calibration:
[08/11/2023-04:54:05] [I] Refit: Disabled
[08/11/2023-04:54:05] [I] Sparsity: Disabled
[08/11/2023-04:54:05] [I] Safe mode: Disabled
[08/11/2023-04:54:05] [I] DirectIO mode: Disabled
[08/11/2023-04:54:05] [I] Restricted mode: Disabled
[08/11/2023-04:54:05] [I] Build only: Disabled
[08/11/2023-04:54:05] [I] Save engine: /home/nvidia/siva/yolov3.bin
[08/11/2023-04:54:05] [I] Load engine:
[08/11/2023-04:54:05] [I] Profiling verbosity: 0
[08/11/2023-04:54:05] [I] Tactic sources: Using default tactic sources
[08/11/2023-04:54:05] [I] timingCacheMode: local
[08/11/2023-04:54:05] [I] timingCacheFile:
[08/11/2023-04:54:05] [I] Heuristic: Disabled
[08/11/2023-04:54:05] [I] Preview Features: Use default preview flags.
[08/11/2023-04:54:05] [I] Input(s)s format: fp32:CHW
[08/11/2023-04:54:05] [I] Output(s)s format: fp32:CHW
[08/11/2023-04:54:05] [I] Input build shapes: model
[08/11/2023-04:54:05] [I] Input calibration shapes: model
[08/11/2023-04:54:05] [I] === System Options ===
[08/11/2023-04:54:05] [I] Device: 0
[08/11/2023-04:54:05] [I] DLACore:
[08/11/2023-04:54:05] [I] Plugins:
[08/11/2023-04:54:05] [I] === Inference Options ===
[08/11/2023-04:54:05] [I] Batch: Explicit
[08/11/2023-04:54:05] [I] Input inference shapes: model
[08/11/2023-04:54:05] [I] Iterations: 10
[08/11/2023-04:54:05] [I] Duration: 3s (+ 200ms warm up)
[08/11/2023-04:54:05] [I] Sleep time: 0ms
[08/11/2023-04:54:05] [I] Idle time: 0ms
[08/11/2023-04:54:05] [I] Streams: 1
[08/11/2023-04:54:05] [I] ExposeDMA: Disabled
[08/11/2023-04:54:05] [I] Data transfers: Enabled
[08/11/2023-04:54:05] [I] Spin-wait: Disabled
[08/11/2023-04:54:05] [I] Multithreading: Disabled
[08/11/2023-04:54:05] [I] CUDA Graph: Disabled
[08/11/2023-04:54:05] [I] Separate profiling: Disabled
[08/11/2023-04:54:05] [I] Time Deserialize: Disabled
[08/11/2023-04:54:05] [I] Time Refit: Disabled
[08/11/2023-04:54:05] [I] NVTX verbosity: 0
[08/11/2023-04:54:05] [I] Persistent Cache Ratio: 0
[08/11/2023-04:54:05] [I] Inputs:
[08/11/2023-04:54:05] [I] === Reporting Options ===
[08/11/2023-04:54:05] [I] Verbose: Disabled
[08/11/2023-04:54:05] [I] Averages: 10 inferences
[08/11/2023-04:54:05] [I] Percentiles: 90,95,99
[08/11/2023-04:54:05] [I] Dump refittable layers:Disabled
[08/11/2023-04:54:05] [I] Dump output: Disabled
[08/11/2023-04:54:05] [I] Profile: Disabled
[08/11/2023-04:54:05] [I] Export timing to JSON file:
[08/11/2023-04:54:05] [I] Export output to JSON file:
[08/11/2023-04:54:05] [I] Export profile to JSON file:
[08/11/2023-04:54:05] [I]
[08/11/2023-04:54:05] [I] === Device Information ===
[08/11/2023-04:54:05] [I] Selected Device: Orin
[08/11/2023-04:54:05] [I] Compute Capability: 8.7
[08/11/2023-04:54:05] [I] SMs: 16
[08/11/2023-04:54:05] [I] Compute Clock Rate: 1.275 GHz
[08/11/2023-04:54:05] [I] Device Global Memory: 28458 MiB
[08/11/2023-04:54:05] [I] Shared Memory per SM: 164 KiB
[08/11/2023-04:54:05] [I] Memory Bus Width: 128 bits (ECC disabled)
[08/11/2023-04:54:05] [I] Memory Clock Rate: 1.275 GHz
[08/11/2023-04:54:05] [I]
[08/11/2023-04:54:05] [I] TensorRT version: 8.5.10
[08/11/2023-04:54:06] [I] [TRT] [MemUsageChange] Init CUDA: CPU +269, GPU +0, now: CPU 298, GPU 5475 (MiB)
[08/11/2023-04:54:07] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +266, GPU +252, now: CPU 583, GPU 5744 (MiB)
[08/11/2023-04:54:07] [I] Start parsing network model
[08/11/2023-04:54:07] [I] [TRT] ----------------------------------------------------------------
[08/11/2023-04:54:07] [I] [TRT] Input filename: /home/nvidia/siva/yolov3.onnx
[08/11/2023-04:54:07] [I] [TRT] ONNX IR version: 0.0.8
[08/11/2023-04:54:07] [I] [TRT] Opset version: 17
[08/11/2023-04:54:07] [I] [TRT] Producer name: NVIDIA TensorRT sample
[08/11/2023-04:54:07] [I] [TRT] Producer version:
[08/11/2023-04:54:07] [I] [TRT] Domain:
[08/11/2023-04:54:07] [I] [TRT] Model version: 0
[08/11/2023-04:54:07] [I] [TRT] Doc string:
[08/11/2023-04:54:07] [I] [TRT] ----------------------------------------------------------------
[08/11/2023-04:54:07] [I] Finish parsing network model
[08/11/2023-04:54:07] [I] [TRT] ---------- Layers Running on DLA ----------
[08/11/2023-04:54:07] [I] [TRT] No layer is running on DLA
[08/11/2023-04:54:07] [I] [TRT] ---------- Layers Running on GPU ----------
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 001_convolutional + 001_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(001_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 002_convolutional + 002_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(002_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 003_convolutional + 003_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(003_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 004_convolutional + 004_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(004_convolutional_lrelu), 005_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 006_convolutional + 006_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(006_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 007_convolutional + 007_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(007_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 008_convolutional + 008_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(008_convolutional_lrelu), 009_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 010_convolutional + 010_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(010_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 011_convolutional + 011_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(011_convolutional_lrelu), 012_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 013_convolutional + 013_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(013_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 014_convolutional + 014_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(014_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 015_convolutional + 015_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(015_convolutional_lrelu), 016_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 017_convolutional + 017_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(017_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 018_convolutional + 018_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(018_convolutional_lrelu), 019_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 020_convolutional + 020_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(020_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 021_convolutional + 021_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(021_convolutional_lrelu), 022_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 023_convolutional + 023_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(023_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 024_convolutional + 024_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(024_convolutional_lrelu), 025_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 026_convolutional + 026_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(026_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 027_convolutional + 027_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(027_convolutional_lrelu), 028_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 029_convolutional + 029_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(029_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 030_convolutional + 030_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(030_convolutional_lrelu), 031_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 032_convolutional + 032_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(032_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 033_convolutional + 033_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(033_convolutional_lrelu), 034_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 035_convolutional + 035_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(035_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 036_convolutional + 036_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(036_convolutional_lrelu), 037_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 038_convolutional + 038_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(038_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 039_convolutional + 039_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(039_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 040_convolutional + 040_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(040_convolutional_lrelu), 041_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 042_convolutional + 042_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(042_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 043_convolutional + 043_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(043_convolutional_lrelu), 044_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 045_convolutional + 045_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(045_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 046_convolutional + 046_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(046_convolutional_lrelu), 047_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 048_convolutional + 048_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(048_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 049_convolutional + 049_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(049_convolutional_lrelu), 050_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 051_convolutional + 051_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(051_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 052_convolutional + 052_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(052_convolutional_lrelu), 053_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 054_convolutional + 054_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(054_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 055_convolutional + 055_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(055_convolutional_lrelu), 056_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 057_convolutional + 057_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(057_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 058_convolutional + 058_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(058_convolutional_lrelu), 059_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 060_convolutional + 060_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(060_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 061_convolutional + 061_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(061_convolutional_lrelu), 062_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 063_convolutional + 063_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(063_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 064_convolutional + 064_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(064_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 065_convolutional + 065_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(065_convolutional_lrelu), 066_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 067_convolutional + 067_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(067_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 068_convolutional + 068_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(068_convolutional_lrelu), 069_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 070_convolutional + 070_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(070_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 071_convolutional + 071_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(071_convolutional_lrelu), 072_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 073_convolutional + 073_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(073_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 074_convolutional + 074_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(074_convolutional_lrelu), 075_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 076_convolutional + 076_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(076_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 077_convolutional + 077_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(077_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 078_convolutional + 078_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(078_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 079_convolutional + 079_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(079_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 080_convolutional + 080_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(080_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 081_convolutional + 081_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(081_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 082_convolutional
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 085_convolutional + 085_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(085_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] RESIZE: 086_upsample
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] COPY: 086_upsample copy
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 088_convolutional + 088_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(088_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 089_convolutional + 089_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(089_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 090_convolutional + 090_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(090_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 091_convolutional + 091_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(091_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 092_convolutional + 092_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(092_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 093_convolutional + 093_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(093_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 094_convolutional
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 097_convolutional + 097_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(097_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] RESIZE: 098_upsample
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] COPY: 098_upsample copy
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 100_convolutional + 100_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(100_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 101_convolutional + 101_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(101_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 102_convolutional + 102_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(102_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 103_convolutional + 103_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(103_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 104_convolutional + 104_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(104_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 105_convolutional + 105_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(105_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 106_convolutional
[08/11/2023-04:54:09] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +536, GPU +512, now: CPU 1592, GPU 6701 (MiB)
[08/11/2023-04:54:09] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +77, now: CPU 1675, GPU 6778 (MiB)
[08/11/2023-04:54:09] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/11/2023-05:41:21] [I] [TRT] Total Activation Memory: 73230150144
[08/11/2023-05:41:21] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[08/11/2023-05:41:21] [I] [TRT] Total Host Persistent Memory: 264672
[08/11/2023-05:41:21] [I] [TRT] Total Device Persistent Memory: 0
[08/11/2023-05:41:21] [I] [TRT] Total Scratch Memory: 0
[08/11/2023-05:41:21] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 252 MiB, GPU 13731 MiB
[08/11/2023-05:41:21] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 237 steps to complete.
[08/11/2023-05:41:21] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 10.8696ms to assign 6 blocks to 237 nodes requiring 7854621184 bytes.
[08/11/2023-05:41:21] [I] [TRT] Total Activation Memory: 7854621184
[08/11/2023-05:41:21] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +234, GPU +256, now: CPU 234, GPU 256 (MiB)
[08/11/2023-05:41:22] [I] Engine built in 2836.78 sec.
[08/11/2023-05:41:22] [I] [TRT] Loaded engine size: 237 MiB
[08/11/2023-05:41:22] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +236, now: CPU 0, GPU 236 (MiB)
[08/11/2023-05:41:22] [I] Engine deserialized in 0.0545005 sec.
[08/11/2023-05:41:24] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +7490, now: CPU 0, GPU 7726 (MiB)
[08/11/2023-05:41:24] [I] Setting persistentCacheLimit to 0 bytes.
[08/11/2023-05:41:24] [I] Using random values for input 000_net
[08/11/2023-05:41:24] [I] Created input binding for 000_net with dimensions 64x3x608x608
[08/11/2023-05:41:24] [I] Using random values for output 082_convolutional
[08/11/2023-05:41:24] [I] Created output binding for 082_convolutional with dimensions 64x255x19x19
[08/11/2023-05:41:24] [I] Using random values for output 094_convolutional
[08/11/2023-05:41:24] [I] Created output binding for 094_convolutional with dimensions 64x255x38x38
[08/11/2023-05:41:24] [I] Using random values for output 106_convolutional
[08/11/2023-05:41:25] [I] Created output binding for 106_convolutional with dimensions 64x255x76x76
[08/11/2023-05:41:25] [I] Starting inference
[08/11/2023-05:41:35] [I] Warmup completed 1 queries over 200 ms
[08/11/2023-05:41:35] [I] Timing trace has 10 queries over 10.7294 s
[08/11/2023-05:41:35] [I]
[08/11/2023-05:41:35] [I] === Trace details ===
[08/11/2023-05:41:35] [I] Trace averages of 10 runs:
[08/11/2023-05:41:35] [I] Average on 10 runs - GPU latency: 974.784 ms - Host latency: 1031.78 ms (enqueue 0.981995 ms)
[08/11/2023-05:41:35] [I]
[08/11/2023-05:41:35] [I] === Performance summary ===
[08/11/2023-05:41:35] [I] Throughput: 0.932016 qps
[08/11/2023-05:41:35] [I] Latency: min = 1012.07 ms, max = 1056.42 ms, mean = 1031.78 ms, median = 1027.12 ms, percentile(90%) = 1055.78 ms, percentile(95%) = 1056.42 ms, percentile(99%) = 1056.42 ms
[08/11/2023-05:41:35] [I] Enqueue Time: min = 0.8136 ms, max = 1.31689 ms, mean = 0.981995 ms, median = 0.953003 ms, percentile(90%) = 1.05591 ms, percentile(95%) = 1.31689 ms, percentile(99%) = 1.31689 ms
[08/11/2023-05:41:35] [I] H2D Latency: min = 17.4687 ms, max = 28.2749 ms, mean = 25.3344 ms, median = 25.6876 ms, percentile(90%) = 28.1725 ms, percentile(95%) = 28.2749 ms, percentile(99%) = 28.2749 ms
[08/11/2023-05:41:35] [I] GPU Compute Time: min = 960.146 ms, max = 993.742 ms, mean = 974.784 ms, median = 970.541 ms, percentile(90%) = 993.602 ms, percentile(95%) = 993.742 ms, percentile(99%) = 993.742 ms
[08/11/2023-05:41:35] [I] D2H Latency: min = 13.7354 ms, max = 37.5093 ms, mean = 31.6631 ms, median = 32.4792 ms, percentile(90%) = 37.4163 ms, percentile(95%) = 37.5093 ms, percentile(99%) = 37.5093 ms
[08/11/2023-05:41:35] [I] Total Host Walltime: 10.7294 s
[08/11/2023-05:41:35] [I] Total GPU Compute Time: 9.74784 s
[08/11/2023-05:41:35] [W] * GPU compute time is unstable, with coefficient of variance = 1.44341%.
[08/11/2023-05:41:35] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[08/11/2023-05:41:35] [I] Explanations of the performance metrics are printed in the verbose logs.
[08/11/2023-05:41:35] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8510] # /usr/src/tensorrt/bin/trtexec --onnx=/home/nvidia/siva/yolov3.onnx --saveEngine=/home/nvidia/siva/yolov3.bin
nvidia@tegra-ubuntu:~/siva$
nvidia@tegra-ubuntu:~/siva$
nvidia@tegra-ubuntu:~/siva$ /usr/src/tensorrt/bin/trtexec --loadEngine=/home/nvidia/siva/yolov3.bin &&&& RUNNING TensorRT.trtexec [TensorRT v8510] # /usr/src/tensorrt/bin/trtexec --loadEngine=/home/nvidia/siva/yolov3.bin
[08/11/2023-05:51:03] [I] === Model Options ===
[08/11/2023-05:51:03] [I] Format: *
[08/11/2023-05:51:03] [I] Model:
[08/11/2023-05:51:03] [I] Output:
[08/11/2023-05:51:03] [I] === Build Options ===
[08/11/2023-05:51:03] [I] Max batch: 1
[08/11/2023-05:51:03] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[08/11/2023-05:51:03] [I] minTiming: 1
[08/11/2023-05:51:03] [I] avgTiming: 8
[08/11/2023-05:51:03] [I] Precision: FP32
[08/11/2023-05:51:03] [I] LayerPrecisions:
[08/11/2023-05:51:03] [I] Layer Device Types:
[08/11/2023-05:51:03] [I] Calibration:
[08/11/2023-05:51:03] [I] Refit: Disabled
[08/11/2023-05:51:03] [I] Sparsity: Disabled
[08/11/2023-05:51:03] [I] Safe mode: Disabled
[08/11/2023-05:51:03] [I] DirectIO mode: Disabled
[08/11/2023-05:51:03] [I] Restricted mode: Disabled
[08/11/2023-05:51:03] [I] Build only: Disabled
[08/11/2023-05:51:03] [I] Save engine:
[08/11/2023-05:51:03] [I] Load engine: /home/nvidia/siva/yolov3.bin
[08/11/2023-05:51:03] [I] Profiling verbosity: 0
[08/11/2023-05:51:03] [I] Tactic sources: Using default tactic sources
[08/11/2023-05:51:03] [I] timingCacheMode: local
[08/11/2023-05:51:03] [I] timingCacheFile:
[08/11/2023-05:51:03] [I] Heuristic: Disabled
[08/11/2023-05:51:03] [I] Preview Features: Use default preview flags.
[08/11/2023-05:51:03] [I] Input(s)s format: fp32:CHW
[08/11/2023-05:51:03] [I] Output(s)s format: fp32:CHW
[08/11/2023-05:51:03] [I] Input build shapes: model
[08/11/2023-05:51:03] [I] Input calibration shapes: model
[08/11/2023-05:51:03] [I] === System Options ===
[08/11/2023-05:51:03] [I] Device: 0
[08/11/2023-05:51:03] [I] DLACore:
[08/11/2023-05:51:03] [I] Plugins:
[08/11/2023-05:51:03] [I] === Inference Options ===
[08/11/2023-05:51:03] [I] Batch: 1
[08/11/2023-05:51:03] [I] Input inference shapes: model
[08/11/2023-05:51:03] [I] Iterations: 10
[08/11/2023-05:51:03] [I] Duration: 3s (+ 200ms warm up)
[08/11/2023-05:51:03] [I] Sleep time: 0ms
[08/11/2023-05:51:03] [I] Idle time: 0ms
[08/11/2023-05:51:03] [I] Streams: 1
[08/11/2023-05:51:03] [I] ExposeDMA: Disabled
[08/11/2023-05:51:03] [I] Data transfers: Enabled
[08/11/2023-05:51:03] [I] Spin-wait: Disabled
[08/11/2023-05:51:03] [I] Multithreading: Disabled
[08/11/2023-05:51:03] [I] CUDA Graph: Disabled
[08/11/2023-05:51:03] [I] Separate profiling: Disabled
[08/11/2023-05:51:03] [I] Time Deserialize: Disabled
[08/11/2023-05:51:03] [I] Time Refit: Disabled
[08/11/2023-05:51:03] [I] NVTX verbosity: 0
[08/11/2023-05:51:03] [I] Persistent Cache Ratio: 0
[08/11/2023-05:51:03] [I] Inputs:
[08/11/2023-05:51:03] [I] === Reporting Options ===
[08/11/2023-05:51:03] [I] Verbose: Disabled
[08/11/2023-05:51:03] [I] Averages: 10 inferences
[08/11/2023-05:51:03] [I] Percentiles: 90,95,99
[08/11/2023-05:51:03] [I] Dump refittable layers:Disabled
[08/11/2023-05:51:03] [I] Dump output: Disabled
[08/11/2023-05:51:03] [I] Profile: Disabled
[08/11/2023-05:51:03] [I] Export timing to JSON file:
[08/11/2023-05:51:03] [I] Export output to JSON file:
[08/11/2023-05:51:03] [I] Export profile to JSON file:
[08/11/2023-05:51:03] [I]
[08/11/2023-05:51:03] [I] === Device Information ===
[08/11/2023-05:51:03] [I] Selected Device: Orin
[08/11/2023-05:51:03] [I] Compute Capability: 8.7
[08/11/2023-05:51:03] [I] SMs: 16
[08/11/2023-05:51:03] [I] Compute Clock Rate: 1.275 GHz
[08/11/2023-05:51:03] [I] Device Global Memory: 28458 MiB
[08/11/2023-05:51:03] [I] Shared Memory per SM: 164 KiB
[08/11/2023-05:51:03] [I] Memory Bus Width: 128 bits (ECC disabled)
[08/11/2023-05:51:03] [I] Memory Clock Rate: 1.275 GHz
[08/11/2023-05:51:03] [I]
[08/11/2023-05:51:03] [I] TensorRT version: 8.5.10
[08/11/2023-05:51:03] [I] Engine loaded in 0.258738 sec.
[08/11/2023-05:51:03] [I] [TRT] Loaded engine size: 237 MiB
[08/11/2023-05:51:03] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +236, now: CPU 0, GPU 236 (MiB)
[08/11/2023-05:51:03] [I] Engine deserialized in 0.541286 sec.
[08/11/2023-05:51:05] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +7490, now: CPU 0, GPU 7726 (MiB)
[08/11/2023-05:51:05] [I] Setting persistentCacheLimit to 0 bytes.
[08/11/2023-05:51:05] [I] Using random values for input 000_net
[08/11/2023-05:51:06] [I] Created input binding for 000_net with dimensions 64x3x608x608
[08/11/2023-05:51:06] [I] Using random values for output 082_convolutional
[08/11/2023-05:51:06] [I] Created output binding for 082_convolutional with dimensions 64x255x19x19
[08/11/2023-05:51:06] [I] Using random values for output 094_convolutional
[08/11/2023-05:51:06] [I] Created output binding for 094_convolutional with dimensions 64x255x38x38
[08/11/2023-05:51:06] [I] Using random values for output 106_convolutional
[08/11/2023-05:51:06] [I] Created output binding for 106_convolutional with dimensions 64x255x76x76
[08/11/2023-05:51:06] [I] Starting inference
[08/11/2023-05:51:16] [I] Warmup completed 1 queries over 200 ms
[08/11/2023-05:51:16] [I] Timing trace has 10 queries over 10.3387 s
[08/11/2023-05:51:16] [I]
[08/11/2023-05:51:16] [I] === Trace details ===
[08/11/2023-05:51:16] [I] Trace averages of 10 runs:
[08/11/2023-05:51:16] [I] Average on 10 runs - GPU latency: 941.794 ms - Host latency: 1000.48 ms (enqueue 0.871793 ms)
[08/11/2023-05:51:16] [I]
[08/11/2023-05:51:16] [I] === Performance summary ===
[08/11/2023-05:51:16] [I] Throughput: 0.967236 qps
[08/11/2023-05:51:16] [I] Latency: min = 963.053 ms, max = 1066.47 ms, mean = 1000.48 ms, median = 977.212 ms, percentile(90%) = 1065.4 ms, percentile(95%) = 1066.47 ms, percentile(99%) = 1066.47 ms
[08/11/2023-05:51:16] [I] Enqueue Time: min = 0.766602 ms, max = 1.02936 ms, mean = 0.871793 ms, median = 0.848633 ms, percentile(90%) = 0.984832 ms, percentile(95%) = 1.02936 ms, percentile(99%) = 1.02936 ms
[08/11/2023-05:51:16] [I] H2D Latency: min = 16.2759 ms, max = 28.3885 ms, mean = 24.8341 ms, median = 24.6426 ms, percentile(90%) = 28.3569 ms, percentile(95%) = 28.3885 ms, percentile(99%) = 28.3885 ms
[08/11/2023-05:51:16] [I] GPU Compute Time: min = 914.301 ms, max = 1000.3 ms, mean = 941.794 ms, median = 919.586 ms, percentile(90%) = 999.095 ms, percentile(95%) = 1000.3 ms, percentile(99%) = 1000.3 ms
[08/11/2023-05:51:16] [I] D2H Latency: min = 13.7393 ms, max = 42.5956 ms, mean = 33.8542 ms, median = 33.4032 ms, percentile(90%) = 41.625 ms, percentile(95%) = 42.5956 ms, percentile(99%) = 42.5956 ms
[08/11/2023-05:51:16] [I] Total Host Walltime: 10.3387 s
[08/11/2023-05:51:16] [I] Total GPU Compute Time: 9.41794 s
[08/11/2023-05:51:16] [W] * GPU compute time is unstable, with coefficient of variance = 4.00771%.
[08/11/2023-05:51:16] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[08/11/2023-05:51:16] [I] Explanations of the performance metrics are printed in the verbose logs.
[08/11/2023-05:51:16] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8510] # /usr/src/tensorrt/bin/trtexec --loadEngine=/home/nvidia/siva/yolov3.bin