Problems with delay (real-time object detection)

Hi everyone!

I have a problem that I was trying to solve for days now but with no success.
The problem goes as follows:

Setup:

  • Cheap Logitech web camera
  • Jetson TX1 on the Auvidea J120 interface board

The pipeline goes like this: camera reads the image using OpenCV3 and the image
is then sent to the Neural Network giving the final results which can be
plotted on the original image (Object Detection problem).
I have timed the network inference time and it is around 0.4s. The problem is
the time difference between the captured image and the processed image.

Here’s a concrete example to make it more clear:

  1. Image is read from the camera at time t=0s.
  2. Image goes through the Neural Network. This takes 0.4s as mentioned above.
  3. So to compute: vid.read(), result = predict(frame), show(result) it takes around 0.4s

So although my model takes 0.4sec to take image and predict, the output image that I can see on the screen is delayed for 2 seconds.

My first thought was that OpenCV was the bottleneck so I’ve written the script
which only reads from the camera and displays the image. There was NO
bottleneck whatsoever so the problem is not in OpenCV.

Another speculation is that as the Jetson is overloaded with Network
computations (GPU util is at 99%), it cannot read images fast enough so the
time difference appears.

CPU is not the bottleneck because the CPU util doesn’t go over 30% on all
cores.

The final test was run on my Macbook 2013 (Intel i7 CPU, 16GB RAM,
Nvidia Geforce 650m).
The interesting part is that everything was working as expected (no extra time
difference)!

Image reading and prediction code looks like the following:

vid = cv2.VideoCapture(0)
if not vid.isOpened():
raise IOError((“Error openning camera feed!”))

Skip frames until reaching start_frame

if start_frame > 0:
vid.set(cv2.CAP_PROP_POS_MSEC, start_frame)


while True:
image = vid.read()
# Network processing
result = network.inference(image)

I hope you can help me out because I am all out of ideas :(

Is the Logitech webcam transmitting in compressed format, or raw? If compressed (like MPEG), OpenCV may not know to utilize hardware decoder. You should be able to query and set the format using V4L2 tools.

I know the Logitech C920 works smoothly in ‘raw’ mode. Which one are you using?

I am currently using cheap USB Logitech Webcam C160.
Command “v4l2-ctl --list-formats-ext” outputs:

ioctl: VIDIOC_ENUM_FMT
Index : 0
Type : Video Capture
Pixel Format: ‘MJPG’ (compressed)
Name : MJPEG
Size: Discrete 160x120
Interval: Discrete 0.033s (30.000 fps)
Interval: Discrete 0.040s (25.000 fps)
Interval: Discrete 0.050s (20.000 fps)
Interval: Discrete 0.067s (15.000 fps)
Interval: Discrete 0.100s (10.000 fps)
Interval: Discrete 0.200s (5.000 fps)
Size: Discrete 176x144
Interval: Discrete 0.033s (30.000 fps)
Interval: Discrete 0.040s (25.000 fps)
Interval: Discrete 0.050s (20.000 fps)
Interval: Discrete 0.067s (15.000 fps)
Interval: Discrete 0.100s (10.000 fps)
Interval: Discrete 0.200s (5.000 fps)
Size: Discrete 320x240
Interval: Discrete 0.033s (30.000 fps)
Interval: Discrete 0.040s (25.000 fps)
Interval: Discrete 0.050s (20.000 fps)
Interval: Discrete 0.067s (15.000 fps)
Interval: Discrete 0.100s (10.000 fps)
Interval: Discrete 0.200s (5.000 fps)
Size: Discrete 352x288
Interval: Discrete 0.033s (30.000 fps)
Interval: Discrete 0.040s (25.000 fps)
Interval: Discrete 0.050s (20.000 fps)
Interval: Discrete 0.067s (15.000 fps)
Interval: Discrete 0.100s (10.000 fps)
Interval: Discrete 0.200s (5.000 fps)
Size: Discrete 640x480
Interval: Discrete 0.033s (30.000 fps)
Interval: Discrete 0.040s (25.000 fps)
Interval: Discrete 0.050s (20.000 fps)
Interval: Discrete 0.067s (15.000 fps)
Interval: Discrete 0.100s (10.000 fps)
Interval: Discrete 0.200s (5.000 fps)

Index       : 1
Type        : Video Capture
Pixel Format: 'YUYV'
Name        : YUV 4:2:2 (YUYV)
	Size: Discrete 160x120
		Interval: Discrete 0.033s (30.000 fps)
		Interval: Discrete 0.040s (25.000 fps)
		Interval: Discrete 0.050s (20.000 fps)
		Interval: Discrete 0.067s (15.000 fps)
		Interval: Discrete 0.100s (10.000 fps)
		Interval: Discrete 0.200s (5.000 fps)
	Size: Discrete 176x144
		Interval: Discrete 0.033s (30.000 fps)
		Interval: Discrete 0.040s (25.000 fps)
		Interval: Discrete 0.050s (20.000 fps)
		Interval: Discrete 0.067s (15.000 fps)
		Interval: Discrete 0.100s (10.000 fps)
		Interval: Discrete 0.200s (5.000 fps)
	Size: Discrete 320x240
		Interval: Discrete 0.033s (30.000 fps)
		Interval: Discrete 0.040s (25.000 fps)
		Interval: Discrete 0.050s (20.000 fps)
		Interval: Discrete 0.067s (15.000 fps)
		Interval: Discrete 0.100s (10.000 fps)
		Interval: Discrete 0.200s (5.000 fps)
	Size: Discrete 352x288
		Interval: Discrete 0.033s (30.000 fps)
		Interval: Discrete 0.040s (25.000 fps)
		Interval: Discrete 0.050s (20.000 fps)
		Interval: Discrete 0.067s (15.000 fps)
		Interval: Discrete 0.100s (10.000 fps)
		Interval: Discrete 0.200s (5.000 fps)
	Size: Discrete 640x480
		Interval: Discrete 0.033s (30.000 fps)
		Interval: Discrete 0.040s (25.000 fps)
		Interval: Discrete 0.050s (20.000 fps)
		Interval: Discrete 0.067s (15.000 fps)
		Interval: Discrete 0.100s (10.000 fps)
		Interval: Discrete 0.200s (5.000 fps)

Could you please be more specific what should I try next?

Hi, it should be something like this:

v4l2-ctl --device /dev/video0 --set-fmt-video=width=640,height=480,pixelformat=YUYV

sudo may be required and your application will need YUV->RGB conversion.
See here for V4L2 commands: http://trac.gateworks.com/wiki/linux/v4l2#pixelformatsframesizesandframerates

Sorry for delayed response. I tried this and it didn’t help. I also tried to compile openCV with gstream support but it also didn’t help.

I tried using a different camera and again I am having the same delay when I use machine learning script.
Let me clarify again, the camera is working perfectly when I use cheese app or same script without machine learning part. The problem starts when I put Jetson under high GPU pressure and even though code works fast (0.3sec to take a photo and predict) image that I see is pretty “old”. I noticed that if I use “maxpref” (overclock) script I am getting shorter delay (from lets say 3 sec to maybe 2 sec), but still too much.

I still don’t know how can I see 2-3 pictures per second, but pictures I see are delayed 3 sec :S

Hi,

Want to clarify first, do you use opencv + caffe?

We found that there is a cvtColor(in, out, CV_RGBA2BGR) CPU-based conversion may cause delay.
Another possible reason is that there is a memory-copy between read-camera and inference.

Could you test our 11_camera_object_identification in MMAPI sample to check if there is also a delay?

Hey,

I am using tensorflow + opencv.

I managed to find the problem. So our whole program was one big while(True) loop where picture loading and prediction was happening. I still don’t know why, but when we created a new thread that was used just for fetching new photo with openCV delay disappeared.

So the only problem was that fetching new photo with openCV while in the same time doing prediction wasn’t smart to do in one thread.