Hi,
I’m trying to run TensorRT on yolov3
implemetation with TF 2.0 which can be found in this repo:
https://github.com/zzh8829/yolov3-tf2
First of all I create a .pb
file of my yolo
model.
# SAVE THE MODEL
def save_model():
tf.saved_model.save(yolo, saved_model_dir)
Then, I convert the saved model into a .trt
format:
# Convert SavedModel using TF-TRT
def convert_model_to_trt():
params = trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(
precision_mode='FP16',
is_dynamic_op=True)
converter = trt.TrtGraphConverterV2(
input_saved_model_dir=saved_model_dir,
conversion_params=params)
converter.convert()
saved_model_dir_trt = "./tnp/yolov3.trt"
converter.save(saved_model_dir_trt)
In the end I’m running an inference function. Which its purpose should be to get the outputs with concrete_function.
I’m debugging the result variable to see the output:
# TRT Benchmark - logging the inference time
def run_and_time(saved_model_dir, ref_result=None):
"""Helper method to measure the running time of a SavedModel."""
NUM_RUNS = 5
root = tf.saved_model.load(saved_model_dir)
concrete_func = root.signatures["serving_default"]
result = None
img = tf.image.decode_image(open(img_path_test, 'rb').read(), channels=3)
img = tf.expand_dims(img, 0)
img = transform_images(img, FLAGS.size)
for _ in range(2): # warm up
concrete_func(input_1=img)
start_time = datetime.datetime.now()
for i in range(NUM_RUNS):
result = concrete_func(input_1=img)
end_time = datetime.datetime.now()
elapsed = end_time - start_time
print(result)
result = result[list(result.keys())[0]]
msgs.append("------> time for %d runs: %s" % (NUM_RUNS, str(elapsed)))
if ref_result is not None:
msgs.append(
"------> max diff: %s" % str(np.max(np.abs(result - ref_result))))
return result
logging.info('weights loaded')
The outputs of the variable results are:
( All of them are zeros)
<class 'dict'>: {'yolo_nms_1': <tf.Tensor: id=75969, shape=(1, 100), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.]], dtype=float32)>, 'yolo_nms_2': <tf.Tensor: id=75970, shape=(1, 100), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.]], dtype=float32)>, 'yolo_nms_3': <tf.Tensor: id=75971, shape=(1,), dtype=int32, numpy=array([0], dtype=int32)>, 'yolo_nms': <tf.Tensor: id=75968, shape=(1, 100, 4), dtype=float32, numpy=
array([[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
...
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]], dtype=float32)>}
Examples of Tensorflow
outputs when using yolo(img)
without TRT:
(<tf.Tensor: id=85563, shape=(1, 100, 4), dtype=float32, numpy=
array([[[0.5706494 , 0.08093378, 0.90879405, 0.76223075],
[0.6956264 , 0.637429 , 0.7248049 , 0.6526146 ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
...
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]]], dtype=float32)>, <tf.Tensor: id=85564, shape=(1, 100), dtype=float32, numpy=
array([[0.60076845, 0.29851934, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
...
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ]],
dtype=float32)>, <tf.Tensor: id=85565, shape=(1, 100), dtype=float32, numpy=
array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
....
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.]], dtype=float32)>, <tf.Tensor: id=85566, shape=(1,), dtype=int32, numpy=array([2], dtype=int32)>)
I debugged the TF and TRT SavedModel
signature and they’re different in the shape
:
TensorFlow:
The given SavedModel SignatureDef contains the following input(s):
inputs['input_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, -1, -1, 3)
name: serving_default_input_1:0
The given SavedModel SignatureDef contains the following output(s):
outputs['yolo_nms'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 100, 4)
name: StatefulPartitionedCall:0
outputs['yolo_nms_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 100)
name: StatefulPartitionedCall:1
outputs['yolo_nms_2'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 100)
name: StatefulPartitionedCall:2
outputs['yolo_nms_3'] tensor_info:
dtype: DT_INT32
shape: (-1)
name: StatefulPartitionedCall:3
Method name is: tensorflow/serving/predict
TensorRT:
The given SavedModel SignatureDef contains the following input(s):
inputs['input_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, -1, -1, 3)
name: serving_default_input_1:0
The given SavedModel SignatureDef contains the following output(s):
outputs['yolo_nms'] tensor_info:
dtype: DT_FLOAT
shape: unknown_rank
name: PartitionedCall:0
outputs['yolo_nms_1'] tensor_info:
dtype: DT_FLOAT
shape: unknown_rank
name: PartitionedCall:1
outputs['yolo_nms_2'] tensor_info:
dtype: DT_FLOAT
shape: unknown_rank
name: PartitionedCall:2
outputs['yolo_nms_3'] tensor_info:
dtype: DT_INT32
shape: unknown_rank
name: PartitionedCall:3
Method name is: tensorflow/serving/predict
My questions are:
-
Am I doing the last part wrong, and I should use the
.trt
engine in another way? -
Is there a simple Yolov3-TensorRT which works on TensorFlow? (Currently checking: https://github.com/lewes6369/TensorRT-Yolov3 , but this is used with .caffe model, but still will check that out)
-
Should I try to convert to
.onnx
and there run the inference with the provided sample (NVIDIA (Sample Support Guide :: NVIDIA Deep Learning TensorRT Documentation) #number 30?)?