TensorRT: Cannot set bindings for dynamic shapes

t_n · June 21, 2021, 7:16am

Description

Hi, I somewhat cannot get the following script running. Apparently the error originates from trying to bind a dynamic shape. Weirdly the error only pops up in TensorRT8 on my laptop - I had an almost similar script running with TensorRT7 on an AGX Xavier without any hazzle:

trt.py:

import tensorflow as tf
import keras2onnx
import tensorrt as trt
import numpy as np
import pycuda.driver as cuda
import pycuda.autoinit
import time

tf.compat.v1.disable_v2_behavior() 

INPUT_NAME = 'input'
BATCH_SIZE = 100
LEN_INPUT = 5
LEN_OUTPUT = 3
TF_FILE_PATH = './model.h5'
ONNX_FILE_PATH = './model.onnx'
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)


def convert_to_onnx():
    model = tf.keras.models.load_model(TF_FILE_PATH)
    onnx_model = keras2onnx.convert_keras(model)
    keras2onnx.save_model(onnx_model, ONNX_FILE_PATH)


def build_engine():
    network_creation_flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)   

    # initialize TensorRT engine and parse ONNX model
    builder = trt.Builder(TRT_LOGGER)
    network = builder.create_network(network_creation_flag)
    parser = trt.OnnxParser(network, TRT_LOGGER)
    config = builder.create_builder_config()
    profile = builder.create_optimization_profile()

    # set input shape min/opt/max for optimization profile
    profile.set_shape(INPUT_NAME, (BATCH_SIZE, LEN_INPUT), \
        (BATCH_SIZE, LEN_INPUT), (BATCH_SIZE, LEN_INPUT))
    print('Profile shape: ' + str(profile.get_shape(INPUT_NAME)))
    add_succ = config.add_optimization_profile(profile)
    print('Added optimization profile successfully: ' + str(add_succ))

    # specify batch size for builder
    builder.max_batch_size = BATCH_SIZE

    # parse ONNX
    print('Beginning ONNX file parsing')
    with open(ONNX_FILE_PATH, 'rb') as model:
        parser.parse(model.read())
    print('Completed parsing of ONNX file')

    # generate TensorRT engine optimized for the target platform
    print('Building an engine...')
    engine = builder.build_engine(network, config=config)
    context = engine.create_execution_context()
    print("Completed creating Engine")

    # set input dimensions at runtime
    print('Engine binding shape: ' + str(engine.get_profile_shape(profile_index=0, binding=0)))
    context.set_binding_shape(0, (BATCH_SIZE, LEN_INPUT))
    print('Binding set')

    return engine, context


def inference(engine, context):
    d_input = cuda.mem_alloc(BATCH_SIZE * LEN_INPUT * np.dtype(np.float32).itemsize)
    d_output = cuda.mem_alloc(BATCH_SIZE * LEN_INPUT * np.dtype(np.float32).itemsize)
    bindings = [int(d_input), int(d_output)]

    stream = cuda.Stream()

    t1 = time.time()

    for i in range(500):
        input = np.random.random((BATCH_SIZE,LEN_INPUT)).astype(np.float32)
        output = np.random.random((BATCH_SIZE,LEN_OUTPUT)).astype(np.float32)
        cuda.memcpy_htod_async(d_input, input, stream)
        context.execute_async(BATCH_SIZE, bindings, stream.handle, None)
        cuda.memcpy_dtoh_async(output, d_output, stream)
        stream.synchronize()
        #print ("Prediction: " + str(output))

    print('Total execution time: ' + str(time.time() - t1)) 

if __name__ == '__main__':
    convert_to_onnx()
    engine, context = build_engine()
    inference(engine, context)

console output:

2021-06-21 09:06:08.543047: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:From /home/bcaie/environments/tensorrt_test/lib/python3.8/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:tensorflow:From /home/bcaie/environments/tensorrt_test/lib/python3.8/site-packages/tensorflow/python/keras/initializers/initializers_v1.py:47: calling RandomNormal.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2021-06-21 09:06:09.918908: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-21 09:06:09.919050: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-21 09:06:09.919222: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-21 09:06:09.919590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA Quadro M2000M computeCapability: 5.0
coreClock: 1.137GHz coreCount: 5 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 74.65GiB/s
2021-06-21 09:06:09.919607: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-21 09:06:09.919677: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06-21 09:06:09.919748: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-06-21 09:06:09.920580: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-21 09:06:09.920613: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-21 09:06:09.920770: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.3/lib64:/home/bcaie/xavier_hybrid_models/catkin_ws/devel/lib:/opt/ros/noetic/lib
2021-06-21 09:06:09.921560: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-06-21 09:06:09.921591: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-21 09:06:09.921599: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-06-21 09:06:09.922140: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-21 09:06:09.922166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-21 09:06:09.922175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      
2021-06-21 09:06:09.925542: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-06-21 09:06:09.926639: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2799925000 Hz
tf executing eager_mode: False
tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 13 -> 8
The maximum opset needed by this model is only 9.
None
Profile shape: [(100, 5), (100, 5), (100, 5)]
Added optimization profile successfully: 0
Beginning ONNX file parsing
Completed parsing of ONNX file
Building an engine...
trt.py:56: DeprecationWarning: Use build_serialized_network instead.
  engine = builder.build_engine(network, config=config)
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead
Completed creating Engine
Engine binding shape: [(0, 5), (0, 5), (0, 5)]
[TensorRT] ERROR: [executionContext.cpp::setBindingDimensions::954] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::954, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [100,5] for bindings[0] exceed min ~ max range at index 0, maximum dimension in profile is 0, minimum dimension in profile is 0, but supplied dimension is 100.
)
Binding set
Total execution time: 0.014324188232421875
terminate called after throwing an instance of 'nvinfer1::CudaDriverError'
  what():  TensorRT internal error
Aborted (core dumped)

Environment

TensorRT Version: 8.0.0.3
GPU Type: Quadro M2000
Nvidia Driver Version: 465.19.01
CUDA Version: 11.3
CUDNN Version: 8.2.0.53
Operating System + Version: Ubuntu 20.04.1
Python Version (if applicable): 3.8.5
TensorFlow Version (if applicable): 2.5.0
Baremetal or Container (if container which image + tag): Baremetal + python virtualenv

Relevant Files

I’ve added both the original .h5 file from keras and the converted .onnx file together with the netron outputs (if it helps)
model.h5 (33.8 KB)
model.onnx (1.8 KB)

Steps To Reproduce

python3 trt.py

NVES · June 21, 2021, 7:37am

Hi,
This looks like a Jetson issue. We recommend you to raise it to the respective platform from the below link

Thanks!

t_n · June 21, 2021, 7:49am

thanks, but it is NOT a Jetson issue. The code runs fine (in a slightly modified version) on a Jetson with TensorRT7 but fails on a HP ZBook with the configuration described above.

EDIT: I was wondering if it is related to the name of the input. It is currently set to “input” but the .onnx file has “dense_input” as input. However, the script crashes if I change the input to “dense_input”

t_n · June 21, 2021, 9:59am

thanks, I was able to fix the problem:

set input to “dense_input”
make first dimension of input shape dynamic using graphsurgeon

Topic		Replies	Views
TensorRT Error: Cannot find binding of given name TensorRT	2	2915	March 30, 2022
Cannot Convert Custom Model To TensorRT TensorRT	10	1769	October 12, 2021
Calibration failed: INTERNAL: Failed to build TensorRT engine (INT8 precision mode) in Jetson Xavier NX (16GB) Jetson Xavier NX tensorrt	9	754	April 12, 2023
Error while converting my model to a TensorRT model. Not found: Container TF-TRT does not exist. (Could not find resource: TF-TRT/TRTEngineOp_0_0) TensorRT tensorrt	1	2578	December 9, 2021
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1417	July 12, 2022
Jetson TX2 Tensorrt l4t-tensorflow NGC Segmentation fault at build trt graphconverterV2 Jetson TX2 tensorrt	4	485	May 17, 2023
Erorr with onnx to trt Jetson Xavier NX tensorrt	8	1259	March 30, 2022
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	758	April 30, 2024
Jetson-Inference predictions differ from e.g. tensorflow predictions Jetson Nano jetson-inference	4	868	November 17, 2021
ERORR with ONNX2TRT : Unknown embedded device detected Jetson Xavier NX onnx	18	4589	April 27, 2022

TensorRT: Cannot set bindings for dynamic shapes

Description

Environment

Relevant Files

Steps To Reproduce

Related topics