TF2-TensorRT conversion with Conv2DTranspose and generate serialized engine file

Description

I’ve been trying to convert a .h5 tensorflow 2 model into tensorrt serialized engine file. But nothing is working out because of the Conv2DTranspose layer. What we usually do is this: convert the model into intermediate ONNX (opset 11 and 9) format and convert the ONNX file into a serialized engine. And this error shows up: Error Code 10: Internal Error (Could not find any implementation for node StatefulPartitionedCall/net/D1/conv2d_transpose + const_fold_opt__229 + StatefulPartitionedCall/net/D1/BiasAdd + StatefulPartitionedCall/net/D1/Relu.). So far I have used keras2onnx, tf2onnx, a notebook for conversion and trtexec tool, but nothing has worked out so far. From the ONNX support matrix from here, Conv2DTranspose is not supported, but for tensorrt it is. Is there any way I can convert my .h5 to a serialized engine without running into this issue? Any resources or support is greatly appreciated.

Environment

TensorRT Version: 7.1.3
GPU Type: 512-core Volta GPU with Tensor Cores
Nvidia Driver Version:
CUDA Version: 10.2
CUDNN Version:
Operating System + Version:
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): 2.3.1
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

I cannot share the model because of privacy issues. But the network being used is a U-Net architecture.

Steps To Reproduce

  • Convert a UNet model into ONNX file using tf2onnx
  • Convert the ONNX file into a serialized engine using trtexec

Conversion log

Command: python3 -m tf2onnx.convert --saved-model unet_segmentation_v1 --output unet_segmentation_v1.onnx --opset 11
Out:

2021-09-29 09:27:23.044171: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
/usr/lib/python3.8/runpy.py:127: RuntimeWarning: 'tf2onnx.convert' found in sys.modules after import of package 'tf2onnx', but prior to execution of 'tf2onnx.convert'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
2021-09-29 09:27:25.325309: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-09-29 09:27:25.391234: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:25.392032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 5.80GiB deviceMemoryBandwidth: 268.26GiB/s
2021-09-29 09:27:25.392065: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 09:27:25.413401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-09-29 09:27:25.413478: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-09-29 09:27:25.420046: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-09-29 09:27:25.423616: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-09-29 09:27:25.426418: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-09-29 09:27:25.432849: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-09-29 09:27:25.433993: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-09-29 09:27:25.434167: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:25.435390: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:25.437377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-09-29 09:27:25.438329: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-29 09:27:25.440154: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:25.441170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 5.80GiB deviceMemoryBandwidth: 268.26GiB/s
2021-09-29 09:27:25.441344: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:25.442326: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:25.443166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-09-29 09:27:25.443560: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 09:27:26.411929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-29 09:27:26.411970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-09-29 09:27:26.411982: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-09-29 09:27:26.412221: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:26.412846: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:26.413360: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:26.413801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4204 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-09-29 09:27:26,417 - WARNING - '--tag' not specified for saved_model. Using --tag serve
2021-09-29 09:27:27,197 - INFO - Signatures found in model: [serving_default].
2021-09-29 09:27:27,197 - WARNING - '--signature_def' not specified, using first signature: serving_default
2021-09-29 09:27:27,198 - INFO - Output names: ['decoder_o/p']
2021-09-29 09:27:27.204948: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.205469: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2021-09-29 09:27:27.205617: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2021-09-29 09:27:27.206588: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.207114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 5.80GiB deviceMemoryBandwidth: 268.26GiB/s
2021-09-29 09:27:27.207184: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.207695: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.208087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-09-29 09:27:27.208124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-29 09:27:27.208132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-09-29 09:27:27.208143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-09-29 09:27:27.208342: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.208884: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.209325: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4204 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-09-29 09:27:27.228496: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2894530000 Hz
2021-09-29 09:27:27.252924: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1144] Optimization results for grappler item: graph_to_optimize
  function_optimizer: Graph size after: 233 nodes (192), 317 edges (276), time = 9.803ms.
  function_optimizer: function_optimizer did nothing. time = 0.136ms.

2021-09-29 09:27:27.481574: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.482110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 5.80GiB deviceMemoryBandwidth: 268.26GiB/s
2021-09-29 09:27:27.482252: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.482775: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.483191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-09-29 09:27:27.483238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-29 09:27:27.483245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-09-29 09:27:27.483254: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-09-29 09:27:27.483429: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.483964: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.484437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4204 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From /home/vishnnu/.local/lib/python3.8/site-packages/tf2onnx/tf_loader.py:662: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
2021-09-29 09:27:27,516 - WARNING - From /home/vishnnu/.local/lib/python3.8/site-packages/tf2onnx/tf_loader.py:662: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
2021-09-29 09:27:27.525671: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.526173: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2021-09-29 09:27:27.526299: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2021-09-29 09:27:27.526741: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.527220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 5.80GiB deviceMemoryBandwidth: 268.26GiB/s
2021-09-29 09:27:27.527282: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.527757: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.528135: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-09-29 09:27:27.528170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-29 09:27:27.528179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-09-29 09:27:27.528187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-09-29 09:27:27.528397: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.528930: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-29 09:27:27.529365: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4204 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-09-29 09:27:27.561559: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1144] Optimization results for grappler item: graph_to_optimize
  constant_folding: Graph size after: 137 nodes (-76), 221 edges (-76), time = 16.399ms.
  function_optimizer: function_optimizer did nothing. time = 0.2ms.
  constant_folding: Graph size after: 137 nodes (0), 221 edges (0), time = 3.994ms.
  function_optimizer: function_optimizer did nothing. time = 0.204ms.

2021-09-29 09:27:27,610 - INFO - Using tensorflow=2.5.0, onnx=1.9.0, tf2onnx=1.9.1/8e8c23
2021-09-29 09:27:27,610 - INFO - Using opset <onnx, 11>
2021-09-29 09:27:27,719 - INFO - Computed 0 values for constant folding
2021-09-29 09:27:28,092 - INFO - Optimizing ONNX model
2021-09-29 09:27:28,309 - INFO - After optimization: Add -1 (9->8), Cast -5 (5->0), Concat -5 (5->0), Const -45 (83->38), Identity -7 (7->0), Shape -5 (5->0), Slice -5 (5->0), Squeeze -5 (5->0), Transpose -30 (32->2), Unsqueeze -20 (20->0)
2021-09-29 09:27:28,315 - INFO - 
2021-09-29 09:27:28,315 - INFO - Successfully converted TensorFlow model unet_segmentation_v1 to ONNX
2021-09-29 09:27:28,315 - INFO - Model inputs: ['input_layer']
2021-09-29 09:27:28,315 - INFO - Model outputs: ['decoder_o/p']

Command: trtexec --onnx=unet_segmentation_v1.onnx --saveEngine=unet_segmentation_v1.plan --fp16 --optShapes=1x448x448x3 --shapes=1x448x448x3
Out:

&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # trtexec --onnx=unet_segmentation_v1.onnx --saveEngine=unet_segmentation_v1.plan --fp16 --optShapes=1x448x448x3 --shapes=1x448x448x3
[09/29/2021-09:28:38] [I] === Model Options ===
[09/29/2021-09:28:38] [I] Format: ONNX
[09/29/2021-09:28:38] [I] Model: unet_segmentation_v1.onnx
[09/29/2021-09:28:38] [I] Output:
[09/29/2021-09:28:38] [I] === Build Options ===
[09/29/2021-09:28:38] [I] Max batch: explicit
[09/29/2021-09:28:38] [I] Workspace: 16 MiB
[09/29/2021-09:28:38] [I] minTiming: 1
[09/29/2021-09:28:38] [I] avgTiming: 8
[09/29/2021-09:28:38] [I] Precision: FP32+FP16
[09/29/2021-09:28:38] [I] Calibration: 
[09/29/2021-09:28:38] [I] Refit: Disabled
[09/29/2021-09:28:38] [I] Sparsity: Disabled
[09/29/2021-09:28:38] [I] Safe mode: Disabled
[09/29/2021-09:28:38] [I] Restricted mode: Disabled
[09/29/2021-09:28:38] [I] Save engine: unet_segmentation_v1.plan
[09/29/2021-09:28:38] [I] Load engine: 
[09/29/2021-09:28:38] [I] NVTX verbosity: 0
[09/29/2021-09:28:38] [I] Tactic sources: Using default tactic sources
[09/29/2021-09:28:38] [I] timingCacheMode: local
[09/29/2021-09:28:38] [I] timingCacheFile: 
[09/29/2021-09:28:38] [I] Input(s)s format: fp32:CHW
[09/29/2021-09:28:38] [I] Output(s)s format: fp32:CHW
[09/29/2021-09:28:38] [I] Input build shape: 1x448x448x3=1x448x448x3+1x448x448x3+1x448x448x3
[09/29/2021-09:28:38] [I] Input calibration shapes: model
[09/29/2021-09:28:38] [I] === System Options ===
[09/29/2021-09:28:38] [I] Device: 0
[09/29/2021-09:28:38] [I] DLACore: 
[09/29/2021-09:28:38] [I] Plugins:
[09/29/2021-09:28:38] [I] === Inference Options ===
[09/29/2021-09:28:38] [I] Batch: Explicit
[09/29/2021-09:28:38] [I] Input inference shape: 1x448x448x3=1x448x448x3
[09/29/2021-09:28:38] [I] Iterations: 10
[09/29/2021-09:28:38] [I] Duration: 3s (+ 200ms warm up)
[09/29/2021-09:28:38] [I] Sleep time: 0ms
[09/29/2021-09:28:38] [I] Streams: 1
[09/29/2021-09:28:38] [I] ExposeDMA: Disabled
[09/29/2021-09:28:38] [I] Data transfers: Enabled
[09/29/2021-09:28:38] [I] Spin-wait: Disabled
[09/29/2021-09:28:38] [I] Multithreading: Disabled
[09/29/2021-09:28:38] [I] CUDA Graph: Disabled
[09/29/2021-09:28:38] [I] Separate profiling: Disabled
[09/29/2021-09:28:38] [I] Time Deserialize: Disabled
[09/29/2021-09:28:38] [I] Time Refit: Disabled
[09/29/2021-09:28:38] [I] Skip inference: Disabled
[09/29/2021-09:28:38] [I] Inputs:
[09/29/2021-09:28:38] [I] === Reporting Options ===
[09/29/2021-09:28:38] [I] Verbose: Disabled
[09/29/2021-09:28:38] [I] Averages: 10 inferences
[09/29/2021-09:28:38] [I] Percentile: 99
[09/29/2021-09:28:38] [I] Dump refittable layers:Disabled
[09/29/2021-09:28:38] [I] Dump output: Disabled
[09/29/2021-09:28:38] [I] Profile: Disabled
[09/29/2021-09:28:38] [I] Export timing to JSON file: 
[09/29/2021-09:28:38] [I] Export output to JSON file: 
[09/29/2021-09:28:38] [I] Export profile to JSON file: 
[09/29/2021-09:28:38] [I] 
[09/29/2021-09:28:38] [I] === Device Information ===
[09/29/2021-09:28:38] [I] Selected Device: NVIDIA GeForce GTX 1660 Ti
[09/29/2021-09:28:38] [I] Compute Capability: 7.5
[09/29/2021-09:28:38] [I] SMs: 24
[09/29/2021-09:28:38] [I] Compute Clock Rate: 1.59 GHz
[09/29/2021-09:28:38] [I] Device Global Memory: 5944 MiB
[09/29/2021-09:28:38] [I] Shared Memory per SM: 64 KiB
[09/29/2021-09:28:38] [I] Memory Bus Width: 192 bits (ECC disabled)
[09/29/2021-09:28:38] [I] Memory Clock Rate: 6.001 GHz
[09/29/2021-09:28:38] [I] 
[09/29/2021-09:28:38] [I] TensorRT version: 8001
[09/29/2021-09:28:39] [I] [TRT] [MemUsageChange] Init CUDA: CPU +328, GPU +0, now: CPU 335, GPU 672 (MiB)
[09/29/2021-09:28:39] [I] Start parsing network model
[09/29/2021-09:28:39] [I] [TRT] ----------------------------------------------------------------
[09/29/2021-09:28:39] [I] [TRT] Input filename:   unet_segmentation_v1.onnx
[09/29/2021-09:28:39] [I] [TRT] ONNX IR version:  0.0.6
[09/29/2021-09:28:39] [I] [TRT] Opset version:    11
[09/29/2021-09:28:39] [I] [TRT] Producer name:    tf2onnx
[09/29/2021-09:28:39] [I] [TRT] Producer version: 1.9.1
[09/29/2021-09:28:39] [I] [TRT] Domain:           
[09/29/2021-09:28:39] [I] [TRT] Model version:    0
[09/29/2021-09:28:39] [I] [TRT] Doc string:       
[09/29/2021-09:28:39] [I] [TRT] ----------------------------------------------------------------
[09/29/2021-09:28:39] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/29/2021-09:28:39] [I] Finish parsing network model
[09/29/2021-09:28:39] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 338, GPU 672 (MiB)
[09/29/2021-09:28:39] [W] Dynamic dimensions required for input: input_layer, but no shapes were provided. Automatically overriding shape to: 1x448x448x3
[09/29/2021-09:28:39] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 338 MiB, GPU 672 MiB
[09/29/2021-09:28:40] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +481, GPU +206, now: CPU 819, GPU 878 (MiB)
[09/29/2021-09:28:41] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +394, GPU +193, now: CPU 1213, GPU 1071 (MiB)
[09/29/2021-09:28:41] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0
[09/29/2021-09:28:41] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[09/29/2021-09:28:45] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[09/29/2021-09:30:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1607, GPU 1216 (MiB)
[09/29/2021-09:30:00] [E] Error[10]: [optimizer.cpp::computeCosts::1855] Error Code 10: Internal Error (Could not find any implementation for node StatefulPartitionedCall/net/D1/conv2d_transpose + const_fold_opt__229 + StatefulPartitionedCall/net/D1/BiasAdd + StatefulPartitionedCall/net/D1/Relu.)
[09/29/2021-09:30:00] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)
Segmentation fault (core dumped)

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

unet_segmentation.onnx (2.1 MB)
This is the ONNX I’m trying to convert. I’m using the 2 commands I’ve posted in the question to convert the model.

Also I ran the script, there was no output. Nothing was written into stdout and None was returned.
I’ll attach the log you’ve requested as well. conversion.log (81.5 KB)

Hi,

By increasing the workspace size we are unable to reproduce the issue. Please try --workspace option.

Thank you.

1 Like

I increased the workspace to 256MB and it worked. Rather a simple solution but nevertheless it worked. Thank you very much. If you don’t mind, can you please tell me why the increase in workspace made it work? Or guide me to any documentation?

I am also interested in the same conversion. Using a unet architecture containing a Conv2D Transpose layer. From what I could gather from the TensorRT documentation:

The layer Conv2DTranspose is not listed as a supported operator in TensorRT.

It also mentions that only preliminary test have been carried out on other model architectures than image classifiers. With this in mind, would it even be expected that the model when converted to TensorRT has the same output as the Tensorflow version.

If not, would this conversion even be advisable.

Thanks for any advice you can give.

1 Like

Hi,

This is known issue, TRT’s Conv2DTranspose requires quite some amount of workspace. this may be fixed in the future releases.

Thank you.

1 Like