TensorRT: Cannot set bindings for dynamic shapes

Description

Hi, I somewhat cannot get the following script running. Apparently the error originates from trying to bind a dynamic shape. Weirdly the error only pops up in TensorRT8 on my laptop - I had an almost similar script running with TensorRT7 on an AGX Xavier without any hazzle:

trt.py:

import tensorflow as tf
import keras2onnx
import tensorrt as trt
import numpy as np
import pycuda.driver as cuda
import pycuda.autoinit
import time

tf.compat.v1.disable_v2_behavior() 

INPUT_NAME = 'input'
BATCH_SIZE = 100
LEN_INPUT = 5
LEN_OUTPUT = 3
TF_FILE_PATH = './model.h5'
ONNX_FILE_PATH = './model.onnx'
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)


def convert_to_onnx():
    model = tf.keras.models.load_model(TF_FILE_PATH)
    onnx_model = keras2onnx.convert_keras(model)
    keras2onnx.save_model(onnx_model, ONNX_FILE_PATH)


def build_engine():
    network_creation_flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)   

    # initialize TensorRT engine and parse ONNX model
    builder = trt.Builder(TRT_LOGGER)
    network = builder.create_network(network_creation_flag)
    parser = trt.OnnxParser(network, TRT_LOGGER)
    config = builder.create_builder_config()
    profile = builder.create_optimization_profile()

    # set input shape min/opt/max for optimization profile
    profile.set_shape(INPUT_NAME, (BATCH_SIZE, LEN_INPUT), \
        (BATCH_SIZE, LEN_INPUT), (BATCH_SIZE, LEN_INPUT))
    print('Profile shape: ' + str(profile.get_shape(INPUT_NAME)))
    add_succ = config.add_optimization_profile(profile)
    print('Added optimization profile successfully: ' + str(add_succ))

    # specify batch size for builder
    builder.max_batch_size = BATCH_SIZE

    # parse ONNX
    print('Beginning ONNX file parsing')
    with open(ONNX_FILE_PATH, 'rb') as model:
        parser.parse(model.read())
    print('Completed parsing of ONNX file')

    # generate TensorRT engine optimized for the target platform
    print('Building an engine...')
    engine = builder.build_engine(network, config=config)
    context = engine.create_execution_context()
    print("Completed creating Engine")

    # set input dimensions at runtime
    print('Engine binding shape: ' + str(engine.get_profile_shape(profile_index=0, binding=0)))
    context.set_binding_shape(0, (BATCH_SIZE, LEN_INPUT))
    print('Binding set')

    return engine, context


def inference(engine, context):
    d_input = cuda.mem_alloc(BATCH_SIZE * LEN_INPUT * np.dtype(np.float32).itemsize)
    d_output = cuda.mem_alloc(BATCH_SIZE * LEN_INPUT * np.dtype(np.float32).itemsize)
    bindings = [int(d_input), int(d_output)]

    stream = cuda.Stream()

    t1 = time.time()

    for i in range(500):
        input = np.random.random((BATCH_SIZE,LEN_INPUT)).astype(np.float32)
        output = np.random.random((BATCH_SIZE,LEN_OUTPUT)).astype(np.float32)
        cuda.memcpy_htod_async(d_input, input, stream)
        context.execute_async(BATCH_SIZE, bindings, stream.handle, None)
        cuda.memcpy_dtoh_async(output, d_output, stream)
        stream.synchronize()
        #print ("Prediction: " + str(output))

    print('Total execution time: ' + str(time.time() - t1)) 

if __name__ == '__main__':
    convert_to_onnx()
    engine, context = build_engine()
    inference(engine, context)

console output:

2021-06-21 09:06:08.543047: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:From /home/bcaie/environments/tensorrt_test/lib/python3.8/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:tensorflow:From /home/bcaie/environments/tensorrt_test/lib/python3.8/site-packages/tensorflow/python/keras/initializers/initializers_v1.py:47: calling RandomNormal.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2021-06-21 09:06:09.918908: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-21 09:06:09.919050: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-21 09:06:09.919222: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-21 09:06:09.919590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA Quadro M2000M computeCapability: 5.0
coreClock: 1.137GHz coreCount: 5 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 74.65GiB/s
2021-06-21 09:06:09.919607: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-21 09:06:09.919677: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06-21 09:06:09.919748: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-06-21 09:06:09.920580: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-21 09:06:09.920613: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-21 09:06:09.920770: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.3/lib64:/home/bcaie/xavier_hybrid_models/catkin_ws/devel/lib:/opt/ros/noetic/lib
2021-06-21 09:06:09.921560: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-06-21 09:06:09.921591: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-21 09:06:09.921599: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-06-21 09:06:09.922140: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-21 09:06:09.922166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-21 09:06:09.922175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      
2021-06-21 09:06:09.925542: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-06-21 09:06:09.926639: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2799925000 Hz
tf executing eager_mode: False
tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 13 -> 8
The maximum opset needed by this model is only 9.
None
Profile shape: [(100, 5), (100, 5), (100, 5)]
Added optimization profile successfully: 0
Beginning ONNX file parsing
Completed parsing of ONNX file
Building an engine...
trt.py:56: DeprecationWarning: Use build_serialized_network instead.
  engine = builder.build_engine(network, config=config)
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead
Completed creating Engine
Engine binding shape: [(0, 5), (0, 5), (0, 5)]
[TensorRT] ERROR: [executionContext.cpp::setBindingDimensions::954] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::954, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [100,5] for bindings[0] exceed min ~ max range at index 0, maximum dimension in profile is 0, minimum dimension in profile is 0, but supplied dimension is 100.
)
Binding set
Total execution time: 0.014324188232421875
terminate called after throwing an instance of 'nvinfer1::CudaDriverError'
  what():  TensorRT internal error
Aborted (core dumped)

Environment

TensorRT Version: 8.0.0.3
GPU Type: Quadro M2000
Nvidia Driver Version: 465.19.01
CUDA Version: 11.3
CUDNN Version: 8.2.0.53
Operating System + Version: Ubuntu 20.04.1
Python Version (if applicable): 3.8.5
TensorFlow Version (if applicable): 2.5.0
Baremetal or Container (if container which image + tag): Baremetal + python virtualenv

Relevant Files

I’ve added both the original .h5 file from keras and the converted .onnx file together with the netron outputs (if it helps)
model.h5 (33.8 KB)
model.onnx (1.8 KB)


Steps To Reproduce

python3 trt.py

Hi,
This looks like a Jetson issue. We recommend you to raise it to the respective platform from the below link

Thanks!

thanks, but it is NOT a Jetson issue. The code runs fine (in a slightly modified version) on a Jetson with TensorRT7 but fails on a HP ZBook with the configuration described above.

EDIT: I was wondering if it is related to the name of the input. It is currently set to “input” but the .onnx file has “dense_input” as input. However, the script crashes if I change the input to “dense_input”

thanks, I was able to fix the problem:

  • set input to “dense_input”
  • make first dimension of input shape dynamic using graphsurgeon