Hstack(),vstack() affects object detection

Hi, I am combining the image from 2 cameras side by side to make a 200-degree view object detection system. I am using mobilenet-v2 but when I use hstack to combine the video array the AI doesn’t detect me unless I am 1 foot away from the camera.However, if I use vstack() it detects me 9 feet away. Any idea way???

import jetson.inference
import jetson.utils
import time
import cv2
import numpy as np
import math
import os
import pyttsx3
engine=pyttsx3.init()
engine.setProperty(‘rate’,150)
engine.setProperty(‘voice’,‘english+m1’)
text=‘initialising’
text1=’ Powering down’
engine.say(text)
engine.runAndWait()

timeStamp=time.time()
fpsFilt=0
reset=0

net=jetson.inference.detectNet(‘ssd-mobilenet-v2’,threshold=.5)

frame_counter=0
font=cv2.FONT_HERSHEY_SIMPLEX

print(cv2.version)
dispW=1280
dispH=720
dtav=0

cam0=cv2.VideoCapture(-1)
cam0.set(cv2.CAP_PROP_FRAME_WIDTH, dispW)
cam0.set(cv2.CAP_PROP_FRAME_HEIGHT, dispH)

cam1=cv2.VideoCapture(1)
cam1.set(cv2.CAP_PROP_FRAME_WIDTH, dispW)
cam1.set(cv2.CAP_PROP_FRAME_HEIGHT, dispH)
while True:
print(‘true’)
_,img0 = cam0.read()
_,img1 = cam1.read()
#img0=cv2.resize(img0,(1280,720))
#img1=cv2.resize(img1,(1280,720))
img=np.vstack((img0,img1))#<<<<<<<<<<<<<<<<<<<<<<<<<

height=img.shape[0]
width=img.shape[1]

frame=cv2.cvtColor(img,cv2.COLOR_BGR2RGBA).astype(np.float32)
frame=jetson.utils.cudaFromNumpy(frame)

detections=net.Detect(frame, width, height)
for detect in detections:
    #print(detect)
    ID=detect.ClassID
    top=int(detect.Top)
    left=int(detect.Left)
    bottom=int(detect.Bottom)
    right=int(detect.Right)
    item=net.GetClassDesc(ID)
    cv2.rectangle(img,(left,top),(right,bottom),(0,225,0),1)
    cv2.putText(img,item,(left,top+20),font,.5,(0,255,0),2)
    if item=='person':
        print('person')

#display.RenderOnce(img,width,height)
dt=time.time()-timeStamp
timeStamp=time.time()
fps=1/dt
fpsFilt=.9*fpsFilt + .1*fps
#print(str(round(fps,1))+' fps')
cv2.putText(img,str(round(fpsFilt,1))+' fps',(0,30),font,1,(0,0,255),2)
reset=reset+1

cv2.imshow('nanoCam',img)
if cv2.waitKey(1)==ord('q'):
    break

cam1.release()
cam0.release()
cv2.destroyAllWindows()

Hi,

Based on the source below, net.Detect will first resize the input buffer into the model size (cudaTensorNormBGR).

For example, if both camera input is 640x480 and the network input is also 640x480.
With hstack(), the detector squeezes a 1280x480 input into 640x480.
So it requires you to be close enough to generate a meaningful rectangle. ex. width > 20

In contrast, detector convert input from 640x960 → 640x480 with vstack().
This procedure also affects accuracy since the aspect ratio change seriously.
But it seems the detector allows much more variance in height.
(make sense since the range of height is much larger than width)

We recommend you feed the img1 and img2 to the detector separately to solve this issue.
After inferencing, you can combine the output to get the result for 200 degrees.

Thanks.

I understand thanks.