Inferring Yolo_v3.trt model in python

Hi i converted the yolov3-mobilenetv2 tlt model to trt engine with tlt-convertor.
This trt engine giving good results with infering through tlt-infer and deepstream. Next I tried to infer the same TensorRT model via python standalone script.But when i infer with script, i got more false positives.
Here i add my preprocesing and post processing script.
can you ,let me know if any mistakes i need to correct here?

Preprocessing steps

def process_image(self,arr):
    image_resized=cv2.resize(arr,(self.model_w, self.model_h))
    img_np = image_resized.astype(np.float32)
    # HWC -> CHW
    img_np = img_np.transpose((2, 0, 1))
    img_np = img_np.ravel()
    return img_np

Postprocessing steps

def _nms_boxes(self, boxes, box_confidences):

    x_coord = boxes[:, 0]
    y_coord = boxes[:, 1]
    width = boxes[:, 2]
    height = boxes[:, 3]

    areas = width * height
    ordered = box_confidences.argsort()[::-1]

    keep = list()
    while ordered.size > 0:
        # Index of the current element:
        i = ordered[0]
        xx1 = np.maximum(x_coord[i], x_coord[ordered[1:]])
        yy1 = np.maximum(y_coord[i], y_coord[ordered[1:]])
        xx2 = np.minimum(x_coord[i] + width[i], x_coord[ordered[1:]] + width[ordered[1:]])
        yy2 = np.minimum(y_coord[i] + height[i], y_coord[ordered[1:]] + height[ordered[1:]])

        width1 = np.maximum(0.0, xx2 - xx1 + 1)
        height1 = np.maximum(0.0, yy2 - yy1 + 1)
        intersection = width1 * height1
        union = (areas[i] + areas[ordered[1:]] - intersection)

        # Compute the Intersection over Union (IoU) score:
        iou = intersection / union

        # The goal of the NMS algorithm is to reduce the number of adjacent bounding-box
        # candidates to a minimum. In this step, we keep only those elements whose overlap
        # with the current bounding box is lower than the threshold:
        indexes = np.where(iou <= self.nms_threshold)[0]
        ordered = ordered[indexes + 1]

    keep = np.array(keep)
    return keep

def postprocess(self, outputs, wh_format=True):
    Postprocesses the inference output
        outputs (list of float): inference output
        min_confidence (float): min confidence to accept detection
        analysis_classes (list of int): indices of the classes to consider

    Returns: list of list tuple: each element is a two list tuple (x, y) representing the corners of a bb
    p_keep_count = outputs[0]
    p_bboxes = outputs[1]
    p_scores = outputs[2]
    p_classes = outputs[3]
    analysis_classes = list(range(self.NUM_CLASSES))
    threshold = self.min_confidence
    p_bboxes = np.array_split(p_bboxes,len(p_bboxes)/4)
    bbs = []
    class_ids = []
    scores = []
    x_scale = self.img_shape[1] / self.model_w
    y_scale = self.img_shape[0] / self.model_h
    for i in range(p_keep_count[0]):
        assert(p_classes[i] < len(analysis_classes))
        if p_scores[i]>threshold:
            x1 = int(np.round(p_bboxes[i][0]*x_scale))
            y1 = int(np.round(p_bboxes[i][1]*y_scale))
            x2 = int(np.round(p_bboxes[i][2]*x_scale))
            y2 = int(np.round(p_bboxes[i][3]*y_scale))
    bbs = np.asarray(bbs)
    class_ids = np.asarray(class_ids)
    scores = np.asarray(scores)
    nms_boxes, nms_categories, nscores = list(), list(), list()
    for category in set(class_ids):
        idxs = np.where(class_ids == category)
        box = bbs[idxs]
        category = class_ids[idxs]
        confidence = scores[idxs]

        keep = self._nms_boxes(box, confidence)
    if len(nms_boxes)==0:
        return [],[],[]
    return nms_boxes, nms_categories, nscores

Can you share

  1. one original image you want to run inference
  2. the resulted image after inference
  3. full script

@Morganh Thanks for your reply.
Here i attached input and output images, main and inference script. (10.4 KB) (1.2 KB)

Hi @Morganh
Actually i used the tensorRT engine here is trained and generated in another one i have one doubt should i need to convert the etlt model to trt engine in my local system also?
if yes how can i convert etlt to trt engine in python?

Yes, if your two system is not the same(like compute capability, CUDA/Cudnn/TensorrRT vesion), you’d better generate your trt engine in your new local system again.
You can generate trt engine with two ways:

  1. use tlt-export
  2. config etlt file in the deepstream config file, then run deepstream directly

See tlt user guide for more detailed info.

@Morganh yea inside deepstream its automatically convert the etlt model to trt engine and the same model working perfectly.But i need to inference with python script. so i converted the etlt trt engine in my system with the help of tlt-convertor and used converted trt engine in script but still i get false positives.

Can you run tlt-infer? The tlt-infer is the default way for inference.
Actually for TLT, it only supports two ways. One is tlt-infer, another is via deepstream.

yea i can run tlt-infer and also in there any way to infer with python script?

Sure, it can. Please debug your code.
I will also check it later.

Ok sure Thank you for your support

For preprocessing, please follow Discrepancy between results from tlt-infer and trt engine - #8 by Morganh

@Morganh I did the same.But still am getting false positives. I don’t know what’s the mistake i did
def process_image(self,arr):

    image = Image.fromarray(np.uint8(arr))

    image_resized = image.resize(size=(self.model_w, self.model_h))
    img_np = np.array(image_resized,dtype=np.float)
    # HWC -> CHW
    img_np = img_np.transpose((2, 0, 1))
    img_np = img_np.ravel()

No, it is not the same. You already change aspect ratio. Please do not change aspect ratio.

1 Like

please help me.I didn’t change aspect ratio. but still getting false positives

class PreprocessYOLO(object):
“”“A simple class for loading images with PIL and reshaping them to the specified
input resolution for YOLOv3-608.

def __init__(self, yolo_input_resolution):
    """Initialize with the input resolution for YOLOv3, which will stay fixed in this sample.
    Keyword arguments:
    yolo_input_resolution -- two-dimensional tuple with the target network's (spatial)
    input resolution in HW order
    self.yolo_input_resolution = yolo_input_resolution

def process(self, input_image_path):
    """Load an image from the specified input path,
    and return it together with a pre-processed version required for feeding it into a
    YOLOv3 network.
    Keyword arguments:
    input_image_path -- string path of the image to be loaded
    image_raw, image_resized = self._load_and_resize(input_image_path)
    image_preprocessed = self._shuffle_and_normalize(image_resized)
    return image_raw, image_preprocessed

def _load_and_resize(self, input_image_path):
    """Load an image from the specified path and resize it to the input resolution.
    Return the input image before resizing as a PIL Image (required for visualization),
    and the resized image as a NumPy float array.
    Keyword arguments:
    input_image_path -- string path of the image to be loaded

    image_raw =
    # Expecting yolo_input_resolution in (height, width) format, adjusting to PIL
    # convention (width, height) in PIL:
    new_resolution = (
    image_resized = image_raw.resize(
        new_resolution, resample=Image.BICUBIC)
    image_resized = np.array(image_resized, dtype=np.float32, order='C')
    return image_raw, image_resized

def _shuffle_and_normalize(self, image):
    """Normalize a NumPy array representing an image to the range [0, 1], and
    convert it from HWC format ("channels last") to NCHW format ("channels first"
    with leading batch dimension).
    Keyword arguments:
    image -- image as three-dimensional NumPy float array, in HWC format
    image /= 255.0
    # HWC to CHW format:
    image = np.transpose(image, [2, 0, 1])
    # CHW to NCHW format
    image = np.expand_dims(image, axis=0)
    # Convert the image to row-major order, also known as "C order":
    image = np.array(image, dtype=np.float32, order='C')
    return image

You changed the aspect ratio in your below original code.

def process_image(self,arr):
    # image = Image.fromarray(np.uint8(arr))
    #image_resized = image.resize(size=(self.model_w, self.model_h), resample=Image.BILINEAR)
    image_resized=cv2.resize(arr,(self.model_w, self.model_h))
    img_np = image_resized.astype(np.float32)
    # HWC -> CHW
    img_np = img_np.transpose((2, 0, 1))
    # Normalize to [0.0, 1.0] interval (expected by model)
    # img_np = (1.0 / 255.0) * img_np
    img_np = img_np.ravel()
    return img_np

Please try to follow the steps I mentioned above.

Yea tried this.Getting negative values in bbox coordinates with high confidence score.

[array([442.91855, 561.6644 , 995.78345, 944.9944 ], dtype=float32), array([ 431.9024 , -49.264008, 1008.4338 , 469.98044 ], dtype=float32),

What did you modify in your original code? Can you share the latest one?

As per your suggestion i changed preprocessing code only. (11.2 KB)

Can you mention what has been changed in def process_image?
I did not see any change.

Actually my model input size is 1472X960. So if resize the image without changing aspect ratio the resized image size is 1472X828. then how can i feed this image to inference.