Description
Hi, I somewhat cannot get the following script running. Apparently the error originates from trying to bind a dynamic shape. Weirdly the error only pops up in TensorRT8 on my laptop - I had an almost similar script running with TensorRT7 on an AGX Xavier without any hazzle:
trt.py:
import tensorflow as tf
import keras2onnx
import tensorrt as trt
import numpy as np
import pycuda.driver as cuda
import pycuda.autoinit
import time
tf.compat.v1.disable_v2_behavior()
INPUT_NAME = 'input'
BATCH_SIZE = 100
LEN_INPUT = 5
LEN_OUTPUT = 3
TF_FILE_PATH = './model.h5'
ONNX_FILE_PATH = './model.onnx'
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
def convert_to_onnx():
model = tf.keras.models.load_model(TF_FILE_PATH)
onnx_model = keras2onnx.convert_keras(model)
keras2onnx.save_model(onnx_model, ONNX_FILE_PATH)
def build_engine():
network_creation_flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
# initialize TensorRT engine and parse ONNX model
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(network_creation_flag)
parser = trt.OnnxParser(network, TRT_LOGGER)
config = builder.create_builder_config()
profile = builder.create_optimization_profile()
# set input shape min/opt/max for optimization profile
profile.set_shape(INPUT_NAME, (BATCH_SIZE, LEN_INPUT), \
(BATCH_SIZE, LEN_INPUT), (BATCH_SIZE, LEN_INPUT))
print('Profile shape: ' + str(profile.get_shape(INPUT_NAME)))
add_succ = config.add_optimization_profile(profile)
print('Added optimization profile successfully: ' + str(add_succ))
# specify batch size for builder
builder.max_batch_size = BATCH_SIZE
# parse ONNX
print('Beginning ONNX file parsing')
with open(ONNX_FILE_PATH, 'rb') as model:
parser.parse(model.read())
print('Completed parsing of ONNX file')
# generate TensorRT engine optimized for the target platform
print('Building an engine...')
engine = builder.build_engine(network, config=config)
context = engine.create_execution_context()
print("Completed creating Engine")
# set input dimensions at runtime
print('Engine binding shape: ' + str(engine.get_profile_shape(profile_index=0, binding=0)))
context.set_binding_shape(0, (BATCH_SIZE, LEN_INPUT))
print('Binding set')
return engine, context
def inference(engine, context):
d_input = cuda.mem_alloc(BATCH_SIZE * LEN_INPUT * np.dtype(np.float32).itemsize)
d_output = cuda.mem_alloc(BATCH_SIZE * LEN_INPUT * np.dtype(np.float32).itemsize)
bindings = [int(d_input), int(d_output)]
stream = cuda.Stream()
t1 = time.time()
for i in range(500):
input = np.random.random((BATCH_SIZE,LEN_INPUT)).astype(np.float32)
output = np.random.random((BATCH_SIZE,LEN_OUTPUT)).astype(np.float32)
cuda.memcpy_htod_async(d_input, input, stream)
context.execute_async(BATCH_SIZE, bindings, stream.handle, None)
cuda.memcpy_dtoh_async(output, d_output, stream)
stream.synchronize()
#print ("Prediction: " + str(output))
print('Total execution time: ' + str(time.time() - t1))
if __name__ == '__main__':
convert_to_onnx()
engine, context = build_engine()
inference(engine, context)
console output:
2021-06-21 09:06:08.543047: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:From /home/bcaie/environments/tensorrt_test/lib/python3.8/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:tensorflow:From /home/bcaie/environments/tensorrt_test/lib/python3.8/site-packages/tensorflow/python/keras/initializers/initializers_v1.py:47: calling RandomNormal.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2021-06-21 09:06:09.918908: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-21 09:06:09.919050: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-21 09:06:09.919222: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-21 09:06:09.919590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA Quadro M2000M computeCapability: 5.0
coreClock: 1.137GHz coreCount: 5 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 74.65GiB/s
2021-06-21 09:06:09.919607: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-21 09:06:09.919677: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06-21 09:06:09.919748: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-06-21 09:06:09.920580: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-21 09:06:09.920613: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-21 09:06:09.920770: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.3/lib64:/home/bcaie/xavier_hybrid_models/catkin_ws/devel/lib:/opt/ros/noetic/lib
2021-06-21 09:06:09.921560: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-06-21 09:06:09.921591: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-21 09:06:09.921599: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-06-21 09:06:09.922140: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-21 09:06:09.922166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-21 09:06:09.922175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
2021-06-21 09:06:09.925542: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-06-21 09:06:09.926639: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2799925000 Hz
tf executing eager_mode: False
tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 13 -> 8
The maximum opset needed by this model is only 9.
None
Profile shape: [(100, 5), (100, 5), (100, 5)]
Added optimization profile successfully: 0
Beginning ONNX file parsing
Completed parsing of ONNX file
Building an engine...
trt.py:56: DeprecationWarning: Use build_serialized_network instead.
engine = builder.build_engine(network, config=config)
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead
Completed creating Engine
Engine binding shape: [(0, 5), (0, 5), (0, 5)]
[TensorRT] ERROR: [executionContext.cpp::setBindingDimensions::954] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::954, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [100,5] for bindings[0] exceed min ~ max range at index 0, maximum dimension in profile is 0, minimum dimension in profile is 0, but supplied dimension is 100.
)
Binding set
Total execution time: 0.014324188232421875
terminate called after throwing an instance of 'nvinfer1::CudaDriverError'
what(): TensorRT internal error
Aborted (core dumped)
Environment
TensorRT Version: 8.0.0.3
GPU Type: Quadro M2000
Nvidia Driver Version: 465.19.01
CUDA Version: 11.3
CUDNN Version: 8.2.0.53
Operating System + Version: Ubuntu 20.04.1
Python Version (if applicable): 3.8.5
TensorFlow Version (if applicable): 2.5.0
Baremetal or Container (if container which image + tag): Baremetal + python virtualenv
Relevant Files
I’ve added both the original .h5 file from keras and the converted .onnx file together with the netron outputs (if it helps)
model.h5 (33.8 KB)
model.onnx (1.8 KB)
Steps To Reproduce
python3 trt.py