Running TensorRT on Yolov3 (TF 2.0 implementation)

Hi,
I’m trying to run TensorRT on yolov3 implemetation with TF 2.0 which can be found in this repo:
https://github.com/zzh8829/yolov3-tf2

First of all I create a .pb file of my yolo model.

# SAVE THE MODEL
    def save_model():
        tf.saved_model.save(yolo, saved_model_dir)

Then, I convert the saved model into a .trt format:

# Convert SavedModel using TF-TRT
    def convert_model_to_trt():
        params = trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(
            precision_mode='FP16',
            is_dynamic_op=True)
        converter = trt.TrtGraphConverterV2(
            input_saved_model_dir=saved_model_dir,
            conversion_params=params)
        converter.convert()
        saved_model_dir_trt = "./tnp/yolov3.trt"
        converter.save(saved_model_dir_trt)

In the end I’m running an inference function. Which its purpose should be to get the outputs with concrete_function.
I’m debugging the result variable to see the output:

# TRT Benchmark - logging the inference time
    def run_and_time(saved_model_dir, ref_result=None):
        """Helper method to measure the running time of a SavedModel."""
        NUM_RUNS = 5
        root = tf.saved_model.load(saved_model_dir)
        concrete_func = root.signatures["serving_default"]
        result = None
        img = tf.image.decode_image(open(img_path_test, 'rb').read(), channels=3)
        img = tf.expand_dims(img, 0)
        img = transform_images(img, FLAGS.size)
        for _ in range(2):  # warm up
            concrete_func(input_1=img)

        start_time = datetime.datetime.now()
        for i in range(NUM_RUNS):
            result = concrete_func(input_1=img)
        end_time = datetime.datetime.now()

        elapsed = end_time - start_time
        print(result)
        result = result[list(result.keys())[0]]

        msgs.append("------> time for %d runs: %s" % (NUM_RUNS, str(elapsed)))
        if ref_result is not None:
            msgs.append(
                "------> max diff: %s" % str(np.max(np.abs(result - ref_result))))
        return result

    logging.info('weights loaded')

The outputs of the variable results are:
( All of them are zeros)

<class 'dict'>: {'yolo_nms_1': <tf.Tensor: id=75969, shape=(1, 100), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.]], dtype=float32)>, 'yolo_nms_2': <tf.Tensor: id=75970, shape=(1, 100), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.]], dtype=float32)>, 'yolo_nms_3': <tf.Tensor: id=75971, shape=(1,), dtype=int32, numpy=array([0], dtype=int32)>, 'yolo_nms': <tf.Tensor: id=75968, shape=(1, 100, 4), dtype=float32, numpy=
array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
     ...
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]], dtype=float32)>}

Examples of Tensorflow outputs when using yolo(img) without TRT:

(<tf.Tensor: id=85563, shape=(1, 100, 4), dtype=float32, numpy=
array([[[0.5706494 , 0.08093378, 0.90879405, 0.76223075],
        [0.6956264 , 0.637429  , 0.7248049 , 0.6526146 ],
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ],
       ...
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ]]], dtype=float32)>, <tf.Tensor: id=85564, shape=(1, 100), dtype=float32, numpy=
array([[0.60076845, 0.29851934, 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
      ...
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ]],
      dtype=float32)>, <tf.Tensor: id=85565, shape=(1, 100), dtype=float32, numpy=
array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        ....
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.]], dtype=float32)>, <tf.Tensor: id=85566, shape=(1,), dtype=int32, numpy=array([2], dtype=int32)>)

I debugged the TF and TRT SavedModel signature and they’re different in the shape:
TensorFlow:

The given SavedModel SignatureDef contains the following input(s):
  inputs['input_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, -1, -1, 3)
      name: serving_default_input_1:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['yolo_nms'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 100, 4)
      name: StatefulPartitionedCall:0
  outputs['yolo_nms_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 100)
      name: StatefulPartitionedCall:1
  outputs['yolo_nms_2'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 100)
      name: StatefulPartitionedCall:2
  outputs['yolo_nms_3'] tensor_info:
      dtype: DT_INT32
      shape: (-1)
      name: StatefulPartitionedCall:3
Method name is: tensorflow/serving/predict

TensorRT:

The given SavedModel SignatureDef contains the following input(s):
  inputs['input_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, -1, -1, 3)
      name: serving_default_input_1:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['yolo_nms'] tensor_info:
      dtype: DT_FLOAT
      shape: unknown_rank
      name: PartitionedCall:0
  outputs['yolo_nms_1'] tensor_info:
      dtype: DT_FLOAT
      shape: unknown_rank
      name: PartitionedCall:1
  outputs['yolo_nms_2'] tensor_info:
      dtype: DT_FLOAT
      shape: unknown_rank
      name: PartitionedCall:2
  outputs['yolo_nms_3'] tensor_info:
      dtype: DT_INT32
      shape: unknown_rank
      name: PartitionedCall:3
Method name is: tensorflow/serving/predict

My questions are:

  1. Am I doing the last part wrong, and I should use the .trt engine in another way?

  2. Is there a simple Yolov3-TensorRT which works on TensorFlow? (Currently checking: https://github.com/lewes6369/TensorRT-Yolov3 , but this is used with .caffe model, but still will check that out)

  3. Should I try to convert to .onnx and there run the inference with the provided sample (NVIDIA (https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#yolov3_onnx) #number 30?)?

Well, after exporting again to .trt file it eventually worked BUT…

But there’s no speed improvement. I had a 10MB .pb that turned into 500MB .pb file. And the speed is the same on both. Still investigating. Probably this is a problem with exporting to a .trt file.

Will open a new thread about the convertion errors soon. Hopefully someone could shed some light on it.

1 Like

Hi AnaRhisT,

Hopefully something in this issue will be able to help you debug: https://github.com/tensorflow/tensorrt/issues/89. Particularly, checking your log output to see something like: num_nodes(trt_only): 0 as mentioned in several comments to confirm whether or not your TF-TRT conversion went well.

Alternatively, you could try the route of TF -> ONNX using tf2onnx (https://github.com/onnx/tensorflow-onnx/tree/master/tf2onnx) and ONNX -> TRT using trtexec or the TensorRT ONNX Parser API to create an engine from the ONNX model. There are many examples of converting ONNX->TRT online and in the TensorRT samples, like the yolov3_onnx sample you mentioned.

Thanks,
NVIDIA Enterprise Support

@AnaRhisT Hi there, I am having trouble in optimizing my Tensorflow-yolov3 with TF-TRT. I have converted .ckpt to .pb and I am able to run in my project for demo. But now I want to optimize it using TF-TRT but do not understand the actual way. There are general examples that still not very clear. Can you guide me on how I can optimize this .pb weights with TF-TRT?