Inferring Yolo_v3.trt model in python

@Morganh
Actually my model input size is 1472X960. So if resize the image without changing aspect ratio the resized image size is 1472X828. then how can i feed this image to inference.

You can consider it as padding. Please refer to the steps in Discrepancy between results from tlt-infer and trt engine - #8 by Morganh again.

1 Like

@Morganh These steps are correct?
image = cv2.imread(imname,cv2.COLOR_BGR2RGB)

image_resized=imutils.resize(image,width=self.model_w)
new_image = np.zeros((self.model_h,self.model_w,image_resized.shape[2]), np.uint8)

new_image[0:image_resized.shape[0],0:image_resized.shape[1],:]=image_resized
img_np = new_image.astype(np.float32)

HWC → CHW

img_np = img_np.transpose((2, 0, 1))
img_np = preprocess_input(img_np)
img_np = img_np.ravel()

Can you review your code by yourself? My suggestion is already mentioned above.

1 Like

yes Morganh.I doublechecked my code and i changed the preprocessing as you suggest.But still i am getting negative values in bbox coordinates.can you please tell me why the bbox coordinates are come in negative index.
Here i attached the preprocessing steps

def _preprocess_yolo(self,img, letter_box=True):
“”"Preprocess an image before TRT YOLO inferencing.

# Args
    img: int8 numpy array of shape (img_h, img_w, 3)
    input_shape: a tuple of (H, W)
    letter_box: boolean, specifies whether to keep aspect ratio and
                create a "letterboxed" image for inference

# Returns
    preprocessed img: float32 numpy array of shape (3, H, W)
"""
input_shape = (self.model_h,self.model_w)
if letter_box:
    img_h, img_w, _ = img.shape
    new_h, new_w = input_shape[0], input_shape[1]
    offset_h, offset_w = 0, 0
    if (new_w / img_w) <= (new_h / img_h):
        new_h = int(img_h * new_w / img_w)
        offset_h = (input_shape[0] - new_h) // 2
    else:
        new_w = int(img_w * new_h / img_h)
        offset_w = (input_shape[1] - new_w) // 2
    resized = cv2.resize(img, (new_w, new_h))
    img = np.full((input_shape[0], input_shape[1], 3), 127, dtype=np.uint8)
    img[offset_h:(offset_h + new_h), offset_w:(offset_w + new_w), :] = resized
else:
    img = cv2.resize(img, (input_shape[1], input_shape[0]))

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img.astype(np.float32)
img = img.transpose((2, 0, 1))
img = preprocess_input(img)
return img.ravel()

In your code, the ratio can be min(self.model_w/float(img_w), self.model_h/float(img_h))
Then,

new_w = int(round(img_w * ratio))
new_h = int(round(img_h*ratio))

Please try to resize the image via PIL instead of cv.

im = img.resize((new_w, new_h), Image.ANTIALIAS)

inf_img = Image.new(‘RGB’,(self.model_w, self.model_h))
inf_img.paste(im, (0, 0))

Thank you Morganh
Yeah I did the same .but no luck.still getting negative values.

img_w =arr.size[0]
img_h = arr.size[1]

ratio = min(self.model_w/float(img_w), self.model_h/float(img_h))

new_w = int(round(img_w * ratio))
new_h = int(round(img_h*ratio))

im = arr.resize((new_w, new_h), Image.ANTIALIAS)

inf_img = Image.new(‘RGB’,(self.model_w, self.model_h))
inf_img.paste(im, (0, 0))
inf_img = np.array(inf_img).astype(np.float32)
inference_input = preprocess_input(inf_img.transpose(2, 0, 1))
inference_input = inference_input.ravel()

Can you try inference_input = inf_img.transpose(2, 0, 1) instead?

yea tried. results are different but still getting negative values

for the previous one with preproces_input. i got this results
[[ 513 -144 968 960]
[ 372 1157 786 1982]
[ 376 309 839 1234]
[ 424 1017 897 2002]
[ -55 831 454 2059]
[ 897 338 1237 1644]
[ 338 -174 875 983]
[ 807 854 1284 2054]
[ 822 -180 1273 990]]

and now without preprocess_input got this
[[ 475 1154 897 1989]
[ 355 298 800 1236]
[ 283 -141 719 965]
[ 339 1007 826 2008]
[ 9 227 351 1480]
[ 389 -168 931 982]
[ -10 -159 371 996]
[ 882 196 1243 1492]
[ 813 823 1280 2071]]

In both negative indices are there

What is the meaning of above result? What did you print?

Bounding box indices[ [x1,y1,x2,y2],[…]…] of the detections.

Please modify your code to

x_scale = float(img_shape[1]) / float(model_w)
y_scale = float(img_shape[0]) / float(model_h)
max_scale = max(x_scale,y_scale)

for i in range(p_keep_count[0]):
assert(p_classes[i] < len(analysis_classes))
if p_scores[i]>threshold:

        x1 = int(np.round(p_bboxes[i][0]*max_scale))
        y1 = int(np.round(p_bboxes[i][1]*max_scale)) 
        x2 = int(np.round(p_bboxes[i][2]*max_scale))
        y2 = int(np.round(p_bboxes[i][3]*max_scale))

I run a standalone code against one KITTI image (004987.png)
It can detect bbox as below.
[625 176 646 191]
[784 179 841 211]
[438 173 471 185]

1 Like

yea that’s a change in postprocessing. but i got these results before postprocessing.

i am getting the above results from here
detection_out = self.do_inference(
self.context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream
)

Please try the same image as mine , KITTI image (004987.png).
I can run inference well against it.

ok. could you please check my model in your script. Shall i attach my model and input image here.

I suggest you downloading the models from nvidia and run a sample image

Or you can quickly train a yolo_v3 model via KITTI dataset, then run inference against KITTI image (004987.png)

1 Like

Hi @Morganh , me too had the same issue… the images in which the detection was accurate with tlt-infer, gave false positives with the python code. When i was inferring using tlt-infer i used,

 nms_config {  confidence_threshold: 0.96
  clustering_iou_threshold: 0.6
  top_k: 200
}

And it eliminated all the false positives and gave good results. The exported model is also tested with deepstream and that gave good results as well… not sure what happens with the python code. i used similar code to @jothi.ramasubramanian

As mentioned above, can you train a yolo_v3 model against the public KITTI dataset and retry?