I was having problems with the tf.keras model going to the tensorrt model, and I decided that the problem was with the TensorRT conversion.
According to nvidia’s TensorRT guide, the process of tf.keras to TensorRT is like this:
(The first two steps are common steps, we run in our own cloud, and the next two steps are the conversions made on the Jetson Nano.)
Tf.keras model → converted to frozen_model → converted to uff → converted to tensorrt
As a result, the model is inaccurate when it is finally converted and reasoned.
The version of my tf is tf2.0, tf2.0 is also used when generating the model, and the tf version on the Jetson Nano is tf1. However, when converting from keras to pb, the compatibility package of tf.compat.v1 is used. This is equivalent to converting v2’s keras model to v1’s frozen model. If it is a problem with the data structure of keras, the prediction of this step should be a problem. But my actual test result is: the prediction results of the keras and pb models are the same. Therefore, I judge that the final conversion is not accurate and has nothing to do with tf2.
My test results:
Tf.keras model √
Convert to frozen_model √
Convert to uff (converted successfully)
Converted to tensorrt (converted successfully, but the result is not accurate)
Here are the code:
Model
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
reshape (Reshape) (None, 100, 400, 1) 0
_________________________________________________________________
conv2d (Conv2D) (None, 100, 400, 32) 64
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 50, 200, 32) 0
_________________________________________________________________
dropout (Dropout) (None, 50, 200, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 50, 200, 64) 2112
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 25, 100, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 25, 100, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 25, 100, 64) 4160
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 13, 50, 64) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 13, 50, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 41600) 0
_________________________________________________________________
dense (Dense) (None, 1024) 42599424
_________________________________________________________________
dropout_3 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_1 (Dense) (None, 203) 208075
=================================================================
Total params: 42,813,835
Trainable params: 42,813,835
Non-trainable params: 0
_________________________________________________________________
None
convertTRT
uff_model = uff.from_tensorflow_frozen_model('frozen_model.pb', ['dense_1/BiasAdd'], output_filename='tmp.uff')
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.UffParser() as parser:
builder.max_workspace_size = 1 << 28
builder.max_batch_size = 1
parser.register_input('reshape_input', (1, 40000))
parser.register_output('dense_1/BiasAdd')
parser.parse('tmp.uff', network)
engine = builder.build_cuda_engine(network)
buf = engine.serialize()
with open('model.bin', 'wb') as f:
f.write(buf)
inference
# initialize
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')
runtime = trt.Runtime(TRT_LOGGER)
# create engine
with open('model.bin', 'rb') as f:
buf = f.read()
engine = runtime.deserialize_cuda_engine(buf)
# create buffer
host_inputs = []
cuda_inputs = []
host_outputs = []
cuda_outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
host_mem = cuda.pagelocked_empty(size, np.float32)
cuda_mem = cuda.mem_alloc(host_mem.nbytes)
bindings.append(int(cuda_mem))
if engine.binding_is_input(binding):
host_inputs.append(host_mem)
cuda_inputs.append(cuda_mem)
else:
host_outputs.append(host_mem)
cuda_outputs.append(cuda_mem)
context = engine.create_execution_context()
ori = cv.imread('test.jpg')
image = cv.cvtColor(ori, cv.COLOR_BGR2RGB)
image = convert2gray(image)
image = image.flatten() / 255.0
image = np.expand_dims(image, axis = 0)
np.copyto(host_inputs[0], image.ravel())
cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)
context.execute_async(bindings=bindings, stream_handle=stream.handle)
cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)
stream.synchronize()
# [7 * 29] words embedding
output = host_outputs[0]
Model file:
https://drive.google.com/drive/folders/1SbobJWI9VJ-i4Bfgv5Jotn9eS0SIhXly?usp=sharing