Slow post-processing (up to a second per frame) for yolov3_onnx

Hi all,

I’m using the sample code for converting yolov3 for use in TensorRT. Sample code documentation can be found at

Realised that the post-processing step is extremely slow (up to a second for 1 frame). The cause of the slowdown is

# E.g. in YOLOv3-608, there are three output tensors, which we associate with their
# respective masks. Then we iterate through all output-mask pairs and generate candidates
# for bounding boxes, their corresponding category predictions and their confidences:
boxes, categories, confidences = list(), list(), list()
for output, mask in zip(outputs_reshaped, self.masks):
     box, category, confidence = self._process_feats(output, mask)
     box, category, confidence = self._filter_boxes(box, category, confidence)

which can be found in


Any suggestions on how to improve the post-processing speed?

Basically this line in function _process_feats() cause most of the slowdown:

box_class_probs = sigmoid_v(output_reshaped[..., 5:])

It use np.vectorize which is basically for loop. I think the sigmoid function can be replaced with np.exp() for speed. Or use Keras/Pytorch sigmoid instead.

I am experiencing the same problem and I tired using np.exp() and it seems to have made it slower. I also tried to do import keras.backend as K and change np.vectorize(K.sigmoid) but that ran into some errors. Was this what you were referring to when recommending those improvements?

I sped up this part via using an alternative function for sigmoid. But its still not fast enough compared to the original yolov3.

Check out for some of the things you can try.

Thanks @hengchenkim! I also did the same thing by fixing the issues i had in using Keras.backend.sigmoid function. Improvement was there but like you, i still expect it to be much faster especially how TensorRT is advertising it.

Anybody else have any suggestions to improve speed even further?

I experienced a 10 times speedup by using np.exp() and a bit faster with scipy.special.expit. Still not as fast as other YOLOv3 implementations as @hengchenkim said. What I think you can do to improve it further using Keras is modify this repo: by replace yolo_body with tensorrt model and implement similar post-processing function to the yolo_eval function in