yolov3-tiny with tensorrt

Hello all,

I am having some problems using yolov3-tiny with tensorrt. I trained yolov3-tiny using darknet and COCO dataset (with reduced number of classes), then I tried to convert it to tensorflow using this repository https://github.com/jinyu121/DW2TF, with this version I freeze the graph and I tested with one image getting the output that I expected, so I create the uff file with uff.from_tensorflow_frozen_model:

/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/local/lib/python3.5/dist-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
NOTE: UFF has been tested with TensorFlow 1.12.0. Other versions are not guaranteed to work
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
UFF Version 0.6.3
=== Automatically deduced input nodes ===
[name: "yolov3-tiny/net1"
op: "Placeholder"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "shape"
  value {
    shape {
      dim {
        size: -1
      }
      dim {
        size: 416
      }
      dim {
        size: 416
      }
      dim {
        size: 3
      }
    }
  }
}
]
=========================================

Using output node yolov3-tiny/convolutional10/BiasAdd
Using output node yolov3-tiny/convolutional13/BiasAdd
Converting to UFF graph
No. nodes: 128
UFF Output written to ./freeze_tf/yolov3-tiny.uff
UFF Text Output written to ./freeze_tf/yolov3-tiny.pbtxt

However when I try to create the engine to use tensorrt I have the error:

[INFO] Building engine
[TensorRT] ERROR: UffParser: Parser error: yolov3-tiny/route2: Concat operation axis is out of bounds for layer yolov3-tiny/route2
[TensorRT] ERROR: Network must have at least one output
[INFO] Allocating buffers
Traceback (most recent call last):
  File "test_rt.py", line 165, in <module>
    inputs, outputs, bindings, stream = allocate_buffers(engine)
  File "/home/cbi/darknet/test_python/common.py", line 126, in allocate_buffers
    for binding in engine:
TypeError: 'NoneType' object is not iterable

In other experiment I tried to convert my model to a keras model using the repository https://github.com/qqwweee/keras-yolo3 (instead of using keras I used tensorflow.keras), as before I can freeze my model (make prediction using tensorflow) and convert the graph to uff. In this case I can create the engine and do a prediction, however the output of the model is completely different to the output of the tensorflow (too many detections and the bounding boxes are very big or very small). To process the output I use the example from Tensorrt/samples/python/yolov3_onnx.

Could you please help me with this problems?

System: Ubuntu 16
Python 3.5
Tensorflow 1.12.2 (compiled from source)
Tensorrt 5.1.2.2 rc
CUDA 10.1
CUDNN 7.5
NVIDIA driver 418.56
GeForce GTX 1050

UPDATE: I just changed the LeakyReLU by ReLU layers in my yolov3-tiny network. But the problem was that I forgot change the channel order from HWC to CHW before running the inference. Now my model is working on FP16 and I will try running it in INT8 later this week.

[Comment removed] Sorry, I put an irrelevant answer by accident.