Hello. Can I get an explanation how to properly generate a TensorRT Engine from the onnx files provided for the ocdr sample?
• Hardware Platform (Jetson / GPU)
NVIDIA Jetson Orin NX (16GB ram)
• DeepStream Version
Deepstream 7.0 (in a docker)
• JetPack Version (valid for Jetson only)
Jetpack 6.0 (L4T 36.3.0)
• TensorRT Version
8.6.2.3
• NVIDIA GPU Driver Version (valid for GPU only)
How do I find this?
• Issue Type( questions, new requirements, bugs)
Question/bug?
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
I started a deepstream 7.0 triton multiarch docker.
I installed the libopencv-dev.
I download different models from the following link.
I gitclone the Nvidia-ocdr github for the sample. I make the libraries.
Everything works perfectly with the v1.0 versions of models and for the 2.0 ocr models. But when trying to generate the engine with the ocd ViT models, after around 15minutes of loading in tensorRT, the process is “Killed”. Am I doing something wrong? Here is my command:
/usr/src/tensorrt/bin/trtexec --onnx=/opt/nvidia/deepstream/deepstream/models/ocdnet_fan_tiny_2x_icdar_pruned.onnx --minShapes=input:1x3x736x1280 --optShapes=input:1x3x736x1280 --maxShapes=input:1x3x736x1280 --fp16 --saveEngine=ocdnetvit.fp16.engine
The output I get is following:
root@**********:/~/volume# /usr/src/tensorrt/bin/trtexec --onnx=/opt/nvidia/deepstream/deepstream/models/ocdnet_fan_tiny_2x_icdar_pruned.onnx --minShapes=input:1x3x736x1280 --optShapes=input:1x3x736x1280 --maxShapes=input:1x3x736x1280 --fp16 --saveEngine=ocdnetvit.fp16.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8602] # /usr/src/tensorrt/bin/trtexec --onnx=/opt/nvidia/deepstream/deepstream/models/ocdnet_fan_tiny_2x_icdar_pruned.onnx --minShapes=input:1x3x736x1280 --optShapes=input:1x3x736x1280 --maxShapes=input:1x3x736x1280 --fp16 --saveEngine=ocdnetvit.fp16.engine
[10/03/2024-12:26:24] [I] === Model Options ===
[10/03/2024-12:26:24] [I] Format: ONNX
[10/03/2024-12:26:24] [I] Model: /opt/nvidia/deepstream/deepstream/models/ocdnet_fan_tiny_2x_icdar_pruned.onnx
[10/03/2024-12:26:24] [I] Output:
[10/03/2024-12:26:24] [I] === Build Options ===
[10/03/2024-12:26:24] [I] Max batch: explicit batch
[10/03/2024-12:26:24] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[10/03/2024-12:26:24] [I] minTiming: 1
[10/03/2024-12:26:24] [I] avgTiming: 8
[10/03/2024-12:26:24] [I] Precision: FP32+FP16
[10/03/2024-12:26:24] [I] LayerPrecisions:
[10/03/2024-12:26:24] [I] Layer Device Types:
[10/03/2024-12:26:24] [I] Calibration:
[10/03/2024-12:26:24] [I] Refit: Disabled
[10/03/2024-12:26:24] [I] Restricted mode: Disabled
[10/03/2024-12:26:24] [I] Skip inference: Disabled
[10/03/2024-12:26:24] [I] Save engine: ocdnetvit.fp16.engine
[10/03/2024-12:26:24] [I] Load engine:
[10/03/2024-12:26:24] [I] Profiling verbosity: 0
[10/03/2024-12:26:24] [I] Tactic sources: Using default tactic sources
[10/03/2024-12:26:24] [I] timingCacheMode: local
[10/03/2024-12:26:24] [I] timingCacheFile:
[10/03/2024-12:26:24] [I] Heuristic: Disabled
[10/03/2024-12:26:24] [I] Preview Features: Use default preview flags.
[10/03/2024-12:26:24] [I] MaxAuxStreams: -1
[10/03/2024-12:26:24] [I] BuilderOptimizationLevel: -1
[10/03/2024-12:26:24] [I] Input(s)s format: fp32:CHW
[10/03/2024-12:26:24] [I] Output(s)s format: fp32:CHW
[10/03/2024-12:26:24] [I] Input build shape: input=1x3x736x1280+1x3x736x1280+1x3x736x1280
[10/03/2024-12:26:24] [I] Input calibration shapes: model
[10/03/2024-12:26:24] [I] === System Options ===
[10/03/2024-12:26:24] [I] Device: 0
[10/03/2024-12:26:24] [I] DLACore:
[10/03/2024-12:26:24] [I] Plugins:
[10/03/2024-12:26:24] [I] setPluginsToSerialize:
[10/03/2024-12:26:24] [I] dynamicPlugins:
[10/03/2024-12:26:24] [I] ignoreParsedPluginLibs: 0
[10/03/2024-12:26:24] [I]
[10/03/2024-12:26:24] [I] === Inference Options ===
[10/03/2024-12:26:24] [I] Batch: Explicit
[10/03/2024-12:26:24] [I] Input inference shape: input=1x3x736x1280
[10/03/2024-12:26:24] [I] Iterations: 10
[10/03/2024-12:26:24] [I] Duration: 3s (+ 200ms warm up)
[10/03/2024-12:26:24] [I] Sleep time: 0ms
[10/03/2024-12:26:24] [I] Idle time: 0ms
[10/03/2024-12:26:24] [I] Inference Streams: 1
[10/03/2024-12:26:24] [I] Data transfers: Enabled
[10/03/2024-12:26:24] [I] Spin-wait: Disabled
[10/03/2024-12:26:24] [I] Multithreading: Disabled
[10/03/2024-12:26:24] [I] CUDA Graph: Disabled
[10/03/2024-12:26:24] [I] Separate profiling: Disabled
[10/03/2024-12:26:24] [I] Time Deserialize: Disabled
[10/03/2024-12:26:24] [I] Time Refit: Disabled
[10/03/2024-12:26:24] [I] NVTX verbosity: 0
[10/03/2024-12:26:24] [I] Persistent Cache Ratio: 0
[10/03/2024-12:26:24] [I] Inputs:
[10/03/2024-12:26:24] [I] === Reporting Options ===
[10/03/2024-12:26:24] [I] Verbose: Disabled
[10/03/2024-12:26:24] [I] Averages: 10 inferences
[10/03/2024-12:26:24] [I] Percentiles: 90,95,99
[10/03/2024-12:26:24] [I] Dump refittable layers:Disabled
[10/03/2024-12:26:24] [I] Dump output: Disabled
[10/03/2024-12:26:24] [I] Profile: Disabled
[10/03/2024-12:26:24] [I] Export timing to JSON file:
[10/03/2024-12:26:24] [I] Export output to JSON file:
[10/03/2024-12:26:24] [I] Export profile to JSON file:
[10/03/2024-12:26:24] [I]
[10/03/2024-12:26:24] [I] === Device Information ===
[10/03/2024-12:26:24] [I] Selected Device: Orin
[10/03/2024-12:26:24] [I] Compute Capability: 8.7
[10/03/2024-12:26:24] [I] SMs: 8
[10/03/2024-12:26:24] [I] Device Global Memory: 15656 MiB
[10/03/2024-12:26:24] [I] Shared Memory per SM: 164 KiB
[10/03/2024-12:26:24] [I] Memory Bus Width: 256 bits (ECC disabled)
[10/03/2024-12:26:24] [I] Application Compute Clock Rate: 0.918 GHz
[10/03/2024-12:26:24] [I] Application Memory Clock Rate: 0.918 GHz
[10/03/2024-12:26:24] [I]
[10/03/2024-12:26:24] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[10/03/2024-12:26:24] [I]
[10/03/2024-12:26:24] [I] TensorRT version: 8.6.2
[10/03/2024-12:26:24] [I] Loading standard plugins
[10/03/2024-12:26:24] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 33, GPU 4900 (MiB)
[10/03/2024-12:26:29] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +1112, now: CPU 1223, GPU 6055 (MiB)
[10/03/2024-12:26:29] [I] Start parsing network model.
[10/03/2024-12:26:29] [I] [TRT] ----------------------------------------------------------------
[10/03/2024-12:26:29] [I] [TRT] Input filename: /opt/nvidia/deepstream/deepstream/models/ocdnet_fan_tiny_2x_icdar_pruned.onnx
[10/03/2024-12:26:29] [I] [TRT] ONNX IR version: 0.0.8
[10/03/2024-12:26:29] [I] [TRT] Opset version: 17
[10/03/2024-12:26:29] [I] [TRT] Producer name: pytorch
[10/03/2024-12:26:29] [I] [TRT] Producer version: 1.14.0
[10/03/2024-12:26:29] [I] [TRT] Domain:
[10/03/2024-12:26:29] [I] [TRT] Model version: 0
[10/03/2024-12:26:29] [I] [TRT] Doc string:
[10/03/2024-12:26:29] [I] [TRT] ----------------------------------------------------------------
[10/03/2024-12:26:29] [W] [TRT] onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/03/2024-12:26:29] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[10/03/2024-12:26:29] [I] Finished parsing network model. Parse time: 0.154953
[10/03/2024-12:26:29] [I] [TRT] Graph optimization time: 0.124676 seconds.
[10/03/2024-12:26:29] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
Killed