Hi,
I ran the conversion from saved model to ONNX using tf2onnx latest version using opset values of 17. It converted to ONNX successfully but the conversion from ONNX to TensorRT was unsuccessful. See the logs below. All the conversions were ran inside the NGC docker container.
Please try to reproduce it on your end. The model is open sourced and can be downloaded directly with the command below. The model was trained using TF Model Garden’s Mask RCNN algorithm
wget https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/two_model_strategy/material/material_version_2.zip
Conversion from saved model to ONNX
python3 -m tf2onnx.convert --opset 17 --saved-model saved_model --output onnx/saved_model.onnx
Conversion from ONNX to TensorRT
/usr/src/tensorrt/bin/trtexec --explicitBatch --onnx=onnx/saved_model.onnx --saveEngine=tensorrt/saved_model.trt
The logs from ONNX to TensorRT conversion is below -
&&&& RUNNING TensorRT.trtexec [TensorRT v8603] # /usr/src/tensorrt/bin/trtexec --explicitBatch --onnx=onnx/saved_model.onnx --saveEngine=tensorrt/saved_model.trt
[04/15/2024-18:05:08] [W] --explicitBatch flag has been deprecated and has no effect!
[04/15/2024-18:05:08] [W] Explicit batch dim is automatically enabled if input model is ONNX or if dynamic shapes are provided when the engine is built.
[04/15/2024-18:05:08] [I] === Model Options ===
[04/15/2024-18:05:08] [I] Format: ONNX
[04/15/2024-18:05:08] [I] Model: onnx/saved_model.onnx
[04/15/2024-18:05:08] [I] Output:
[04/15/2024-18:05:08] [I] === Build Options ===
[04/15/2024-18:05:08] [I] Max batch: explicit batch
[04/15/2024-18:05:08] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[04/15/2024-18:05:08] [I] minTiming: 1
[04/15/2024-18:05:08] [I] avgTiming: 8
[04/15/2024-18:05:08] [I] Precision: FP32
[04/15/2024-18:05:08] [I] LayerPrecisions:
[04/15/2024-18:05:08] [I] Layer Device Types:
[04/15/2024-18:05:08] [I] Calibration:
[04/15/2024-18:05:08] [I] Refit: Disabled
[04/15/2024-18:05:08] [I] Version Compatible: Disabled
[04/15/2024-18:05:08] [I] ONNX Native InstanceNorm: Disabled
[04/15/2024-18:05:08] [I] TensorRT runtime: full
[04/15/2024-18:05:08] [I] Lean DLL Path:
[04/15/2024-18:05:08] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[04/15/2024-18:05:08] [I] Exclude Lean Runtime: Disabled
[04/15/2024-18:05:08] [I] Sparsity: Disabled
[04/15/2024-18:05:08] [I] Safe mode: Disabled
[04/15/2024-18:05:08] [I] Build DLA standalone loadable: Disabled
[04/15/2024-18:05:08] [I] Allow GPU fallback for DLA: Disabled
[04/15/2024-18:05:08] [I] DirectIO mode: Disabled
[04/15/2024-18:05:08] [I] Restricted mode: Disabled
[04/15/2024-18:05:08] [I] Skip inference: Disabled
[04/15/2024-18:05:08] [I] Save engine: tensorrt/saved_model.trt
[04/15/2024-18:05:08] [I] Load engine:
[04/15/2024-18:05:08] [I] Profiling verbosity: 0
[04/15/2024-18:05:08] [I] Tactic sources: Using default tactic sources
[04/15/2024-18:05:08] [I] timingCacheMode: local
[04/15/2024-18:05:08] [I] timingCacheFile:
[04/15/2024-18:05:08] [I] Heuristic: Disabled
[04/15/2024-18:05:08] [I] Preview Features: Use default preview flags.
[04/15/2024-18:05:08] [I] MaxAuxStreams: -1
[04/15/2024-18:05:08] [I] BuilderOptimizationLevel: -1
[04/15/2024-18:05:08] [I] Input(s)s format: fp32:CHW
[04/15/2024-18:05:08] [I] Output(s)s format: fp32:CHW
[04/15/2024-18:05:08] [I] Input build shapes: model
[04/15/2024-18:05:08] [I] Input calibration shapes: model
[04/15/2024-18:05:08] [I] === System Options ===
[04/15/2024-18:05:08] [I] Device: 0
[04/15/2024-18:05:08] [I] DLACore:
[04/15/2024-18:05:08] [I] Plugins:
[04/15/2024-18:05:08] [I] setPluginsToSerialize:
[04/15/2024-18:05:08] [I] dynamicPlugins:
[04/15/2024-18:05:08] [I] ignoreParsedPluginLibs: 0
[04/15/2024-18:05:08] [I]
[04/15/2024-18:05:08] [I] === Inference Options ===
[04/15/2024-18:05:08] [I] Batch: Explicit
[04/15/2024-18:05:08] [I] Input inference shapes: model
[04/15/2024-18:05:08] [I] Iterations: 10
[04/15/2024-18:05:08] [I] Duration: 3s (+ 200ms warm up)
[04/15/2024-18:05:08] [I] Sleep time: 0ms
[04/15/2024-18:05:08] [I] Idle time: 0ms
[04/15/2024-18:05:08] [I] Inference Streams: 1
[04/15/2024-18:05:08] [I] ExposeDMA: Disabled
[04/15/2024-18:05:08] [I] Data transfers: Enabled
[04/15/2024-18:05:08] [I] Spin-wait: Disabled
[04/15/2024-18:05:08] [I] Multithreading: Disabled
[04/15/2024-18:05:08] [I] CUDA Graph: Disabled
[04/15/2024-18:05:08] [I] Separate profiling: Disabled
[04/15/2024-18:05:08] [I] Time Deserialize: Disabled
[04/15/2024-18:05:08] [I] Time Refit: Disabled
[04/15/2024-18:05:08] [I] NVTX verbosity: 0
[04/15/2024-18:05:08] [I] Persistent Cache Ratio: 0
[04/15/2024-18:05:08] [I] Inputs:
[04/15/2024-18:05:08] [I] === Reporting Options ===
[04/15/2024-18:05:08] [I] Verbose: Disabled
[04/15/2024-18:05:08] [I] Averages: 10 inferences
[04/15/2024-18:05:08] [I] Percentiles: 90,95,99
[04/15/2024-18:05:08] [I] Dump refittable layers:Disabled
[04/15/2024-18:05:08] [I] Dump output: Disabled
[04/15/2024-18:05:08] [I] Profile: Disabled
[04/15/2024-18:05:08] [I] Export timing to JSON file:
[04/15/2024-18:05:08] [I] Export output to JSON file:
[04/15/2024-18:05:08] [I] Export profile to JSON file:
[04/15/2024-18:05:08] [I]
[04/15/2024-18:05:12] [I] === Device Information ===
[04/15/2024-18:05:12] [I] Selected Device: Tesla T4
[04/15/2024-18:05:12] [I] Compute Capability: 7.5
[04/15/2024-18:05:12] [I] SMs: 40
[04/15/2024-18:05:12] [I] Device Global Memory: 14928 MiB
[04/15/2024-18:05:12] [I] Shared Memory per SM: 64 KiB
[04/15/2024-18:05:12] [I] Memory Bus Width: 256 bits (ECC enabled)
[04/15/2024-18:05:12] [I] Application Compute Clock Rate: 1.59 GHz
[04/15/2024-18:05:12] [I] Application Memory Clock Rate: 5.001 GHz
[04/15/2024-18:05:12] [I]
[04/15/2024-18:05:12] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[04/15/2024-18:05:12] [I]
[04/15/2024-18:05:12] [I] TensorRT version: 8.6.3
[04/15/2024-18:05:12] [I] Loading standard plugins
[04/15/2024-18:05:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 22, GPU 105 (MiB)
[04/15/2024-18:05:19] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +895, GPU +174, now: CPU 994, GPU 279 (MiB)
[04/15/2024-18:05:20] [I] Start parsing network model.
[04/15/2024-18:05:20] [I] [TRT] ----------------------------------------------------------------
[04/15/2024-18:05:20] [I] [TRT] Input filename: onnx/saved_model.onnx
[04/15/2024-18:05:20] [I] [TRT] ONNX IR version: 0.0.8
[04/15/2024-18:05:20] [I] [TRT] Opset version: 17
[04/15/2024-18:05:20] [I] [TRT] Producer name: tf2onnx
[04/15/2024-18:05:20] [I] [TRT] Producer version: 1.16.1 15c810
[04/15/2024-18:05:20] [I] [TRT] Domain:
[04/15/2024-18:05:20] [I] [TRT] Model version: 0
[04/15/2024-18:05:20] [I] [TRT] Doc string:
[04/15/2024-18:05:20] [I] [TRT] ----------------------------------------------------------------
[04/15/2024-18:05:20] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/15/2024-18:05:20] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
Segmentation fault (core dumped)
root@instance-20240415-152558:/workspace/circularnet# /usr/src/tensorrt/bin/trtexec --explicitBatch --onnx=onnx/saved_model.onnx --saveEngine=tensorrt/saved_model.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8603] # /usr/src/tensorrt/bin/trtexec --explicitBatch --onnx=onnx/saved_model.onnx --saveEngine=tensorrt/saved_model.engine
[04/15/2024-18:06:33] [W] --explicitBatch flag has been deprecated and has no effect!
[04/15/2024-18:06:33] [W] Explicit batch dim is automatically enabled if input model is ONNX or if dynamic shapes are provided when the engine is built.
[04/15/2024-18:06:33] [I] === Model Options ===
[04/15/2024-18:06:33] [I] Format: ONNX
[04/15/2024-18:06:33] [I] Model: onnx/saved_model.onnx
[04/15/2024-18:06:33] [I] Output:
[04/15/2024-18:06:33] [I] === Build Options ===
[04/15/2024-18:06:33] [I] Max batch: explicit batch
[04/15/2024-18:06:33] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[04/15/2024-18:06:33] [I] minTiming: 1
[04/15/2024-18:06:33] [I] avgTiming: 8
[04/15/2024-18:06:33] [I] Precision: FP32
[04/15/2024-18:06:33] [I] LayerPrecisions:
[04/15/2024-18:06:33] [I] Layer Device Types:
[04/15/2024-18:06:33] [I] Calibration:
[04/15/2024-18:06:33] [I] Refit: Disabled
[04/15/2024-18:06:33] [I] Version Compatible: Disabled
[04/15/2024-18:06:33] [I] ONNX Native InstanceNorm: Disabled
[04/15/2024-18:06:33] [I] TensorRT runtime: full
[04/15/2024-18:06:33] [I] Lean DLL Path:
[04/15/2024-18:06:33] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[04/15/2024-18:06:33] [I] Exclude Lean Runtime: Disabled
[04/15/2024-18:06:33] [I] Sparsity: Disabled
[04/15/2024-18:06:33] [I] Safe mode: Disabled
[04/15/2024-18:06:33] [I] Build DLA standalone loadable: Disabled
[04/15/2024-18:06:33] [I] Allow GPU fallback for DLA: Disabled
[04/15/2024-18:06:33] [I] DirectIO mode: Disabled
[04/15/2024-18:06:33] [I] Restricted mode: Disabled
[04/15/2024-18:06:33] [I] Skip inference: Disabled
[04/15/2024-18:06:33] [I] Save engine: tensorrt/saved_model.engine
[04/15/2024-18:06:33] [I] Load engine:
[04/15/2024-18:06:33] [I] Profiling verbosity: 0
[04/15/2024-18:06:33] [I] Tactic sources: Using default tactic sources
[04/15/2024-18:06:33] [I] timingCacheMode: local
[04/15/2024-18:06:33] [I] timingCacheFile:
[04/15/2024-18:06:33] [I] Heuristic: Disabled
[04/15/2024-18:06:33] [I] Preview Features: Use default preview flags.
[04/15/2024-18:06:33] [I] MaxAuxStreams: -1
[04/15/2024-18:06:33] [I] BuilderOptimizationLevel: -1
[04/15/2024-18:06:33] [I] Input(s)s format: fp32:CHW
[04/15/2024-18:06:33] [I] Output(s)s format: fp32:CHW
[04/15/2024-18:06:33] [I] Input build shapes: model
[04/15/2024-18:06:33] [I] Input calibration shapes: model
[04/15/2024-18:06:33] [I] === System Options ===
[04/15/2024-18:06:33] [I] Device: 0
[04/15/2024-18:06:33] [I] DLACore:
[04/15/2024-18:06:33] [I] Plugins:
[04/15/2024-18:06:33] [I] setPluginsToSerialize:
[04/15/2024-18:06:33] [I] dynamicPlugins:
[04/15/2024-18:06:33] [I] ignoreParsedPluginLibs: 0
[04/15/2024-18:06:33] [I]
[04/15/2024-18:06:33] [I] === Inference Options ===
[04/15/2024-18:06:33] [I] Batch: Explicit
[04/15/2024-18:06:33] [I] Input inference shapes: model
[04/15/2024-18:06:33] [I] Iterations: 10
[04/15/2024-18:06:33] [I] Duration: 3s (+ 200ms warm up)
[04/15/2024-18:06:33] [I] Sleep time: 0ms
[04/15/2024-18:06:33] [I] Idle time: 0ms
[04/15/2024-18:06:33] [I] Inference Streams: 1
[04/15/2024-18:06:33] [I] ExposeDMA: Disabled
[04/15/2024-18:06:33] [I] Data transfers: Enabled
[04/15/2024-18:06:33] [I] Spin-wait: Disabled
[04/15/2024-18:06:33] [I] Multithreading: Disabled
[04/15/2024-18:06:33] [I] CUDA Graph: Disabled
[04/15/2024-18:06:33] [I] Separate profiling: Disabled
[04/15/2024-18:06:33] [I] Time Deserialize: Disabled
[04/15/2024-18:06:33] [I] Time Refit: Disabled
[04/15/2024-18:06:33] [I] NVTX verbosity: 0
[04/15/2024-18:06:33] [I] Persistent Cache Ratio: 0
[04/15/2024-18:06:33] [I] Inputs:
[04/15/2024-18:06:33] [I] === Reporting Options ===
[04/15/2024-18:06:33] [I] Verbose: Disabled
[04/15/2024-18:06:33] [I] Averages: 10 inferences
[04/15/2024-18:06:33] [I] Percentiles: 90,95,99
[04/15/2024-18:06:33] [I] Dump refittable layers:Disabled
[04/15/2024-18:06:33] [I] Dump output: Disabled
[04/15/2024-18:06:33] [I] Profile: Disabled
[04/15/2024-18:06:33] [I] Export timing to JSON file:
[04/15/2024-18:06:33] [I] Export output to JSON file:
[04/15/2024-18:06:33] [I] Export profile to JSON file:
[04/15/2024-18:06:33] [I]
[04/15/2024-18:06:38] [I] === Device Information ===
[04/15/2024-18:06:38] [I] Selected Device: Tesla T4
[04/15/2024-18:06:38] [I] Compute Capability: 7.5
[04/15/2024-18:06:38] [I] SMs: 40
[04/15/2024-18:06:38] [I] Device Global Memory: 14928 MiB
[04/15/2024-18:06:38] [I] Shared Memory per SM: 64 KiB
[04/15/2024-18:06:38] [I] Memory Bus Width: 256 bits (ECC enabled)
[04/15/2024-18:06:38] [I] Application Compute Clock Rate: 1.59 GHz
[04/15/2024-18:06:38] [I] Application Memory Clock Rate: 5.001 GHz
[04/15/2024-18:06:38] [I]
[04/15/2024-18:06:38] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[04/15/2024-18:06:38] [I]
[04/15/2024-18:06:38] [I] TensorRT version: 8.6.3
[04/15/2024-18:06:38] [I] Loading standard plugins
[04/15/2024-18:06:38] [I] [TRT] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 22, GPU 105 (MiB)
[04/15/2024-18:06:45] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +895, GPU +174, now: CPU 994, GPU 279 (MiB)
[04/15/2024-18:06:45] [I] Start parsing network model.
[04/15/2024-18:06:46] [I] [TRT] ----------------------------------------------------------------
[04/15/2024-18:06:46] [I] [TRT] Input filename: onnx/saved_model.onnx
[04/15/2024-18:06:46] [I] [TRT] ONNX IR version: 0.0.8
[04/15/2024-18:06:46] [I] [TRT] Opset version: 17
[04/15/2024-18:06:46] [I] [TRT] Producer name: tf2onnx
[04/15/2024-18:06:46] [I] [TRT] Producer version: 1.16.1 15c810
[04/15/2024-18:06:46] [I] [TRT] Domain:
[04/15/2024-18:06:46] [I] [TRT] Model version: 0
[04/15/2024-18:06:46] [I] [TRT] Doc string:
[04/15/2024-18:06:46] [I] [TRT] ----------------------------------------------------------------
[04/15/2024-18:06:46] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/15/2024-18:06:46] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped