Process killed during tensorrt conversion on Jetson orin NX (8 GB)

Hi,

During conversion of mask rcnn saved_model.pb to tensorrt using this script Tensorrt conversion script RAM spiked up beyond 7 GB and the process got killed on Jetson Orin NX(8GB).

Versions:

TensorRT Version: 8.5.2
Jetpack version: 5.1.2
CUDA Version: 11.4
TensorFlow Version: 2.12.0 +nv23.6

Thanks in advance

Hi,

Sorry for the late update.

The script converts the data from UFF into TensorRT which is deprecated and will be removed from the TensorRT v9.0.
It’s more recommended to go through the ONNX flow: TensorFlow → ONNX → TensorRT

Is there any reason that you perfer to use the UFF parser?
The parser only supports the TensorFlowV1-based model.

Thanks.

Hi,

We tried conversion from TF → ONNX with all opset values. 2 issues happening. First the model convert successfully to ONNX but when I deploy the ONNX model on the Triton Inference server inside the NGC container, the model is giving errors while loading on Triton. Secondly the same converted ONNX model is not able to convert to TensorRT.

Is it possible that if we provide the open source weights, you can try on your end. We also conveyed this issue in the GTC conference in San Jose this year.

A few questions as well -

  1. Is it okay to use tf2onnx to convert Tensorflow 2.x saved model to ONNX ?
  2. If YES for the question 1, then which --opset value should we use ?
  3. Should we use trtrexec to convert ONNX to TensorRT further ? If yes which TensorRT version would you recommend ? or any docker container environment ?

Hi,

1. Yes, our user usually converts it with the tf2onnx tool.
2. For JetPack 5.1.2 (TensorRT 8.5), the supports operators up to Opset 17:

3. Yes, please use trtexec to convert the ONNX into TensorRT since it will contain some environment or log that helps to debug.
TensorRT for Jetson is bounded by the JetPack version.
You can find the container here.

We have some tutorials to guide users to convert a TensorFlow model into TensorRT.
You can also check it to get some info:

If you still get stuck with the conversion, please share your TensorFlow model as well as the ONNX file so we can give it a check.
Thanks.

Hi,

I ran the conversion from saved model to ONNX using tf2onnx latest version using opset values of 17. It converted to ONNX successfully but the conversion from ONNX to TensorRT was unsuccessful. See the logs below. All the conversions were ran inside the NGC docker container.

Please try to reproduce it on your end. The model is open sourced and can be downloaded directly with the command below. The model was trained using TF Model Garden’s Mask RCNN algorithm

wget https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/two_model_strategy/material/material_version_2.zip

Conversion from saved model to ONNX

python3 -m tf2onnx.convert --opset 17 --saved-model saved_model --output onnx/saved_model.onnx

Conversion from ONNX to TensorRT

/usr/src/tensorrt/bin/trtexec --explicitBatch --onnx=onnx/saved_model.onnx --saveEngine=tensorrt/saved_model.trt

The logs from ONNX to TensorRT conversion is below -

&&&& RUNNING TensorRT.trtexec [TensorRT v8603] # /usr/src/tensorrt/bin/trtexec --explicitBatch --onnx=onnx/saved_model.onnx --saveEngine=tensorrt/saved_model.trt
[04/15/2024-18:05:08] [W] --explicitBatch flag has been deprecated and has no effect!
[04/15/2024-18:05:08] [W] Explicit batch dim is automatically enabled if input model is ONNX or if dynamic shapes are provided when the engine is built.
[04/15/2024-18:05:08] [I] === Model Options ===
[04/15/2024-18:05:08] [I] Format: ONNX
[04/15/2024-18:05:08] [I] Model: onnx/saved_model.onnx
[04/15/2024-18:05:08] [I] Output:
[04/15/2024-18:05:08] [I] === Build Options ===
[04/15/2024-18:05:08] [I] Max batch: explicit batch
[04/15/2024-18:05:08] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[04/15/2024-18:05:08] [I] minTiming: 1
[04/15/2024-18:05:08] [I] avgTiming: 8
[04/15/2024-18:05:08] [I] Precision: FP32
[04/15/2024-18:05:08] [I] LayerPrecisions: 
[04/15/2024-18:05:08] [I] Layer Device Types: 
[04/15/2024-18:05:08] [I] Calibration: 
[04/15/2024-18:05:08] [I] Refit: Disabled
[04/15/2024-18:05:08] [I] Version Compatible: Disabled
[04/15/2024-18:05:08] [I] ONNX Native InstanceNorm: Disabled
[04/15/2024-18:05:08] [I] TensorRT runtime: full
[04/15/2024-18:05:08] [I] Lean DLL Path: 
[04/15/2024-18:05:08] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[04/15/2024-18:05:08] [I] Exclude Lean Runtime: Disabled
[04/15/2024-18:05:08] [I] Sparsity: Disabled
[04/15/2024-18:05:08] [I] Safe mode: Disabled
[04/15/2024-18:05:08] [I] Build DLA standalone loadable: Disabled
[04/15/2024-18:05:08] [I] Allow GPU fallback for DLA: Disabled
[04/15/2024-18:05:08] [I] DirectIO mode: Disabled
[04/15/2024-18:05:08] [I] Restricted mode: Disabled
[04/15/2024-18:05:08] [I] Skip inference: Disabled
[04/15/2024-18:05:08] [I] Save engine: tensorrt/saved_model.trt
[04/15/2024-18:05:08] [I] Load engine: 
[04/15/2024-18:05:08] [I] Profiling verbosity: 0
[04/15/2024-18:05:08] [I] Tactic sources: Using default tactic sources
[04/15/2024-18:05:08] [I] timingCacheMode: local
[04/15/2024-18:05:08] [I] timingCacheFile: 
[04/15/2024-18:05:08] [I] Heuristic: Disabled
[04/15/2024-18:05:08] [I] Preview Features: Use default preview flags.
[04/15/2024-18:05:08] [I] MaxAuxStreams: -1
[04/15/2024-18:05:08] [I] BuilderOptimizationLevel: -1
[04/15/2024-18:05:08] [I] Input(s)s format: fp32:CHW
[04/15/2024-18:05:08] [I] Output(s)s format: fp32:CHW
[04/15/2024-18:05:08] [I] Input build shapes: model
[04/15/2024-18:05:08] [I] Input calibration shapes: model
[04/15/2024-18:05:08] [I] === System Options ===
[04/15/2024-18:05:08] [I] Device: 0
[04/15/2024-18:05:08] [I] DLACore: 
[04/15/2024-18:05:08] [I] Plugins:
[04/15/2024-18:05:08] [I] setPluginsToSerialize:
[04/15/2024-18:05:08] [I] dynamicPlugins:
[04/15/2024-18:05:08] [I] ignoreParsedPluginLibs: 0
[04/15/2024-18:05:08] [I] 
[04/15/2024-18:05:08] [I] === Inference Options ===
[04/15/2024-18:05:08] [I] Batch: Explicit
[04/15/2024-18:05:08] [I] Input inference shapes: model
[04/15/2024-18:05:08] [I] Iterations: 10
[04/15/2024-18:05:08] [I] Duration: 3s (+ 200ms warm up)
[04/15/2024-18:05:08] [I] Sleep time: 0ms
[04/15/2024-18:05:08] [I] Idle time: 0ms
[04/15/2024-18:05:08] [I] Inference Streams: 1
[04/15/2024-18:05:08] [I] ExposeDMA: Disabled
[04/15/2024-18:05:08] [I] Data transfers: Enabled
[04/15/2024-18:05:08] [I] Spin-wait: Disabled
[04/15/2024-18:05:08] [I] Multithreading: Disabled
[04/15/2024-18:05:08] [I] CUDA Graph: Disabled
[04/15/2024-18:05:08] [I] Separate profiling: Disabled
[04/15/2024-18:05:08] [I] Time Deserialize: Disabled
[04/15/2024-18:05:08] [I] Time Refit: Disabled
[04/15/2024-18:05:08] [I] NVTX verbosity: 0
[04/15/2024-18:05:08] [I] Persistent Cache Ratio: 0
[04/15/2024-18:05:08] [I] Inputs:
[04/15/2024-18:05:08] [I] === Reporting Options ===
[04/15/2024-18:05:08] [I] Verbose: Disabled
[04/15/2024-18:05:08] [I] Averages: 10 inferences
[04/15/2024-18:05:08] [I] Percentiles: 90,95,99
[04/15/2024-18:05:08] [I] Dump refittable layers:Disabled
[04/15/2024-18:05:08] [I] Dump output: Disabled
[04/15/2024-18:05:08] [I] Profile: Disabled
[04/15/2024-18:05:08] [I] Export timing to JSON file: 
[04/15/2024-18:05:08] [I] Export output to JSON file: 
[04/15/2024-18:05:08] [I] Export profile to JSON file: 
[04/15/2024-18:05:08] [I] 
[04/15/2024-18:05:12] [I] === Device Information ===
[04/15/2024-18:05:12] [I] Selected Device: Tesla T4
[04/15/2024-18:05:12] [I] Compute Capability: 7.5
[04/15/2024-18:05:12] [I] SMs: 40
[04/15/2024-18:05:12] [I] Device Global Memory: 14928 MiB
[04/15/2024-18:05:12] [I] Shared Memory per SM: 64 KiB
[04/15/2024-18:05:12] [I] Memory Bus Width: 256 bits (ECC enabled)
[04/15/2024-18:05:12] [I] Application Compute Clock Rate: 1.59 GHz
[04/15/2024-18:05:12] [I] Application Memory Clock Rate: 5.001 GHz
[04/15/2024-18:05:12] [I] 
[04/15/2024-18:05:12] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[04/15/2024-18:05:12] [I] 
[04/15/2024-18:05:12] [I] TensorRT version: 8.6.3
[04/15/2024-18:05:12] [I] Loading standard plugins
[04/15/2024-18:05:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 22, GPU 105 (MiB)
[04/15/2024-18:05:19] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +895, GPU +174, now: CPU 994, GPU 279 (MiB)
[04/15/2024-18:05:20] [I] Start parsing network model.
[04/15/2024-18:05:20] [I] [TRT] ----------------------------------------------------------------
[04/15/2024-18:05:20] [I] [TRT] Input filename:   onnx/saved_model.onnx
[04/15/2024-18:05:20] [I] [TRT] ONNX IR version:  0.0.8
[04/15/2024-18:05:20] [I] [TRT] Opset version:    17
[04/15/2024-18:05:20] [I] [TRT] Producer name:    tf2onnx
[04/15/2024-18:05:20] [I] [TRT] Producer version: 1.16.1 15c810
[04/15/2024-18:05:20] [I] [TRT] Domain:           
[04/15/2024-18:05:20] [I] [TRT] Model version:    0
[04/15/2024-18:05:20] [I] [TRT] Doc string:       
[04/15/2024-18:05:20] [I] [TRT] ----------------------------------------------------------------
[04/15/2024-18:05:20] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/15/2024-18:05:20] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
Segmentation fault (core dumped)
root@instance-20240415-152558:/workspace/circularnet# /usr/src/tensorrt/bin/trtexec --explicitBatch --onnx=onnx/saved_model.onnx --saveEngine=tensorrt/saved_model.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8603] # /usr/src/tensorrt/bin/trtexec --explicitBatch --onnx=onnx/saved_model.onnx --saveEngine=tensorrt/saved_model.engine
[04/15/2024-18:06:33] [W] --explicitBatch flag has been deprecated and has no effect!
[04/15/2024-18:06:33] [W] Explicit batch dim is automatically enabled if input model is ONNX or if dynamic shapes are provided when the engine is built.
[04/15/2024-18:06:33] [I] === Model Options ===
[04/15/2024-18:06:33] [I] Format: ONNX
[04/15/2024-18:06:33] [I] Model: onnx/saved_model.onnx
[04/15/2024-18:06:33] [I] Output:
[04/15/2024-18:06:33] [I] === Build Options ===
[04/15/2024-18:06:33] [I] Max batch: explicit batch
[04/15/2024-18:06:33] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[04/15/2024-18:06:33] [I] minTiming: 1
[04/15/2024-18:06:33] [I] avgTiming: 8
[04/15/2024-18:06:33] [I] Precision: FP32
[04/15/2024-18:06:33] [I] LayerPrecisions: 
[04/15/2024-18:06:33] [I] Layer Device Types: 
[04/15/2024-18:06:33] [I] Calibration: 
[04/15/2024-18:06:33] [I] Refit: Disabled
[04/15/2024-18:06:33] [I] Version Compatible: Disabled
[04/15/2024-18:06:33] [I] ONNX Native InstanceNorm: Disabled
[04/15/2024-18:06:33] [I] TensorRT runtime: full
[04/15/2024-18:06:33] [I] Lean DLL Path: 
[04/15/2024-18:06:33] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[04/15/2024-18:06:33] [I] Exclude Lean Runtime: Disabled
[04/15/2024-18:06:33] [I] Sparsity: Disabled
[04/15/2024-18:06:33] [I] Safe mode: Disabled
[04/15/2024-18:06:33] [I] Build DLA standalone loadable: Disabled
[04/15/2024-18:06:33] [I] Allow GPU fallback for DLA: Disabled
[04/15/2024-18:06:33] [I] DirectIO mode: Disabled
[04/15/2024-18:06:33] [I] Restricted mode: Disabled
[04/15/2024-18:06:33] [I] Skip inference: Disabled
[04/15/2024-18:06:33] [I] Save engine: tensorrt/saved_model.engine
[04/15/2024-18:06:33] [I] Load engine: 
[04/15/2024-18:06:33] [I] Profiling verbosity: 0
[04/15/2024-18:06:33] [I] Tactic sources: Using default tactic sources
[04/15/2024-18:06:33] [I] timingCacheMode: local
[04/15/2024-18:06:33] [I] timingCacheFile: 
[04/15/2024-18:06:33] [I] Heuristic: Disabled
[04/15/2024-18:06:33] [I] Preview Features: Use default preview flags.
[04/15/2024-18:06:33] [I] MaxAuxStreams: -1
[04/15/2024-18:06:33] [I] BuilderOptimizationLevel: -1
[04/15/2024-18:06:33] [I] Input(s)s format: fp32:CHW
[04/15/2024-18:06:33] [I] Output(s)s format: fp32:CHW
[04/15/2024-18:06:33] [I] Input build shapes: model
[04/15/2024-18:06:33] [I] Input calibration shapes: model
[04/15/2024-18:06:33] [I] === System Options ===
[04/15/2024-18:06:33] [I] Device: 0
[04/15/2024-18:06:33] [I] DLACore: 
[04/15/2024-18:06:33] [I] Plugins:
[04/15/2024-18:06:33] [I] setPluginsToSerialize:
[04/15/2024-18:06:33] [I] dynamicPlugins:
[04/15/2024-18:06:33] [I] ignoreParsedPluginLibs: 0
[04/15/2024-18:06:33] [I] 
[04/15/2024-18:06:33] [I] === Inference Options ===
[04/15/2024-18:06:33] [I] Batch: Explicit
[04/15/2024-18:06:33] [I] Input inference shapes: model
[04/15/2024-18:06:33] [I] Iterations: 10
[04/15/2024-18:06:33] [I] Duration: 3s (+ 200ms warm up)
[04/15/2024-18:06:33] [I] Sleep time: 0ms
[04/15/2024-18:06:33] [I] Idle time: 0ms
[04/15/2024-18:06:33] [I] Inference Streams: 1
[04/15/2024-18:06:33] [I] ExposeDMA: Disabled
[04/15/2024-18:06:33] [I] Data transfers: Enabled
[04/15/2024-18:06:33] [I] Spin-wait: Disabled
[04/15/2024-18:06:33] [I] Multithreading: Disabled
[04/15/2024-18:06:33] [I] CUDA Graph: Disabled
[04/15/2024-18:06:33] [I] Separate profiling: Disabled
[04/15/2024-18:06:33] [I] Time Deserialize: Disabled
[04/15/2024-18:06:33] [I] Time Refit: Disabled
[04/15/2024-18:06:33] [I] NVTX verbosity: 0
[04/15/2024-18:06:33] [I] Persistent Cache Ratio: 0
[04/15/2024-18:06:33] [I] Inputs:
[04/15/2024-18:06:33] [I] === Reporting Options ===
[04/15/2024-18:06:33] [I] Verbose: Disabled
[04/15/2024-18:06:33] [I] Averages: 10 inferences
[04/15/2024-18:06:33] [I] Percentiles: 90,95,99
[04/15/2024-18:06:33] [I] Dump refittable layers:Disabled
[04/15/2024-18:06:33] [I] Dump output: Disabled
[04/15/2024-18:06:33] [I] Profile: Disabled
[04/15/2024-18:06:33] [I] Export timing to JSON file: 
[04/15/2024-18:06:33] [I] Export output to JSON file: 
[04/15/2024-18:06:33] [I] Export profile to JSON file: 
[04/15/2024-18:06:33] [I] 
[04/15/2024-18:06:38] [I] === Device Information ===
[04/15/2024-18:06:38] [I] Selected Device: Tesla T4
[04/15/2024-18:06:38] [I] Compute Capability: 7.5
[04/15/2024-18:06:38] [I] SMs: 40
[04/15/2024-18:06:38] [I] Device Global Memory: 14928 MiB
[04/15/2024-18:06:38] [I] Shared Memory per SM: 64 KiB
[04/15/2024-18:06:38] [I] Memory Bus Width: 256 bits (ECC enabled)
[04/15/2024-18:06:38] [I] Application Compute Clock Rate: 1.59 GHz
[04/15/2024-18:06:38] [I] Application Memory Clock Rate: 5.001 GHz
[04/15/2024-18:06:38] [I] 
[04/15/2024-18:06:38] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[04/15/2024-18:06:38] [I] 
[04/15/2024-18:06:38] [I] TensorRT version: 8.6.3
[04/15/2024-18:06:38] [I] Loading standard plugins
[04/15/2024-18:06:38] [I] [TRT] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 22, GPU 105 (MiB)
[04/15/2024-18:06:45] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +895, GPU +174, now: CPU 994, GPU 279 (MiB)
[04/15/2024-18:06:45] [I] Start parsing network model.
[04/15/2024-18:06:46] [I] [TRT] ----------------------------------------------------------------
[04/15/2024-18:06:46] [I] [TRT] Input filename:   onnx/saved_model.onnx
[04/15/2024-18:06:46] [I] [TRT] ONNX IR version:  0.0.8
[04/15/2024-18:06:46] [I] [TRT] Opset version:    17
[04/15/2024-18:06:46] [I] [TRT] Producer name:    tf2onnx
[04/15/2024-18:06:46] [I] [TRT] Producer version: 1.16.1 15c810
[04/15/2024-18:06:46] [I] [TRT] Domain:           
[04/15/2024-18:06:46] [I] [TRT] Model version:    0
[04/15/2024-18:06:46] [I] [TRT] Doc string:       
[04/15/2024-18:06:46] [I] [TRT] ----------------------------------------------------------------
[04/15/2024-18:06:46] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/15/2024-18:06:46] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped

Hi,

The TensorRT log seems to be truncated (no error message included).
Could you check it and share the complete version with us?

Thanks.

It’s not truncated. The conversion process halts after that line and exits.

Hi,

We try to give it a check but meet several warnings when converting the TensorFlow model into the ONNX:

2024-04-18 03:40:38,702 - WARNING - Importing a function (__inference_internal_grad_fn_83090) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.
2024-04-18 03:40:38,873 - WARNING - Importing a function (__inference_internal_grad_fn_81416) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.
2024-04-18 03:40:38,895 - WARNING - Importing a function (__inference_internal_grad_fn_87626) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.

Are these harmless or something went wrong?

(Edit)
It turns out fails with a non-supported error:

[04/18/2024-03:49:40] [E] [TRT] ModelImporter.cpp:727: --- Begin node ---
[04/18/2024-03:49:40] [E] [TRT] ModelImporter.cpp:728: input: "StatefulPartitionedCall/mask_head/Reshape_1:0"
output: "Transpose__4832:0"
name: "Transpose__4832"
op_type: "Transpose"
attribute {
  name: "perm"
  ints: 0
  ints: 1
  ints: 2
  ints: 3
  ints: 0
  ints: 1
  ints: 2
  ints: 3
  ints: 4
  type: INTS
}
domain: ""

[04/18/2024-03:49:40] [E] [TRT] ModelImporter.cpp:729: --- End node ---
[04/18/2024-03:49:40] [E] [TRT] ModelImporter.cpp:731: ERROR: ModelImporter.cpp:185 In function parseGraph:
[6] Invalid Node - Transpose__4832
[graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IShuffleLayer Transpose__4832: reshape changes volume to multiple of original volume. Reshaping [1,100,28,28,19] to [1,100,28,28,1].)

Thanks.

How are you trying to convert it to ONNX ? Can you share the commands and the environment ?

Hi,

We follow the instructions you shared above:

$ wget https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/two_model_strategy/material/material_version_2.zip
$ unzip material_version_2.zip
$ pip3 install tf2onnx
$ python3 -m tf2onnx.convert --opset 17 --saved-model saved_model --output onnx/saved_model.onnx
>>> tensorflow.__version__
'2.12.0'
>>> tf2onnx.__version__
'1.16.1'
>>> onnx.__version__
'1.16.0'

Thanks.

Hi, that’s the reason we reached out to you because the model cannot convert to ONNX. Is there a way to convert a model directly into TensorRT like using TF-TRT ?

Hi,

We have better support from ONNX → TensorRT compared to the TF-TRT.
Do you know why the model cannot be converted into ONNX?
Is there any limitation? Have you checked with the tf2onnx provider?

Thanks.

That’s the reason we contacted you to provide support for its conversion either via ONNX path or TF-TRT path. Google said that they stopped working on TF-TRT and we can get support from NVidia. I am pretty sure someone from NVidia must be working on it. We do not know the root cause of why the model cannot be converted and that’s we asked for your help. I gave you the path of the open source weights already. Can you please help us diagnose it ?

Hi,

Sure, we will discuss this internally.
Thanks.

Hi,

We tried to convert the model with this tutorial and found it doesn’t have the required pipeline_config file.
Are you able to generate a pipeline_config for the TF Garden Model?

Does it have the same architecture as the below model?
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md

Mask R-CNN Inception ResNet V2 1024x1024

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.