I have a problem that I was trying to solve for days now but with no success.
The problem goes as follows:
Setup:
Cheap Logitech web camera
Jetson TX1 on the Auvidea J120 interface board
The pipeline goes like this: camera reads the image using OpenCV3 and the image
is then sent to the Neural Network giving the final results which can be
plotted on the original image (Object Detection problem).
I have timed the network inference time and it is around 0.4s. The problem is
the time difference between the captured image and the processed image.
Here’s a concrete example to make it more clear:
Image is read from the camera at time t=0s.
Image goes through the Neural Network. This takes 0.4s as mentioned above.
So to compute: vid.read(), result = predict(frame), show(result) it takes around 0.4s
So although my model takes 0.4sec to take image and predict, the output image that I can see on the screen is delayed for 2 seconds.
My first thought was that OpenCV was the bottleneck so I’ve written the script
which only reads from the camera and displays the image. There was NO
bottleneck whatsoever so the problem is not in OpenCV.
Another speculation is that as the Jetson is overloaded with Network
computations (GPU util is at 99%), it cannot read images fast enough so the
time difference appears.
CPU is not the bottleneck because the CPU util doesn’t go over 30% on all
cores.
The final test was run on my Macbook 2013 (Intel i7 CPU, 16GB RAM,
Nvidia Geforce 650m).
The interesting part is that everything was working as expected (no extra time
difference)!
Image reading and prediction code looks like the following:
vid = cv2.VideoCapture(0)
if not vid.isOpened():
raise IOError((“Error openning camera feed!”))
Skip frames until reaching start_frame
if start_frame > 0:
vid.set(cv2.CAP_PROP_POS_MSEC, start_frame)
…
…
while True:
image = vid.read()
# Network processing
result = network.inference(image)
I hope you can help me out because I am all out of ideas :(
Is the Logitech webcam transmitting in compressed format, or raw? If compressed (like MPEG), OpenCV may not know to utilize hardware decoder. You should be able to query and set the format using V4L2 tools.
I know the Logitech C920 works smoothly in ‘raw’ mode. Which one are you using?
Sorry for delayed response. I tried this and it didn’t help. I also tried to compile openCV with gstream support but it also didn’t help.
I tried using a different camera and again I am having the same delay when I use machine learning script.
Let me clarify again, the camera is working perfectly when I use cheese app or same script without machine learning part. The problem starts when I put Jetson under high GPU pressure and even though code works fast (0.3sec to take a photo and predict) image that I see is pretty “old”. I noticed that if I use “maxpref” (overclock) script I am getting shorter delay (from lets say 3 sec to maybe 2 sec), but still too much.
I still don’t know how can I see 2-3 pictures per second, but pictures I see are delayed 3 sec :S
We found that there is a cvtColor(in, out, CV_RGBA2BGR) CPU-based conversion may cause delay.
Another possible reason is that there is a memory-copy between read-camera and inference.
Could you test our 11_camera_object_identification in MMAPI sample to check if there is also a delay?
I managed to find the problem. So our whole program was one big while(True) loop where picture loading and prediction was happening. I still don’t know why, but when we created a new thread that was used just for fetching new photo with openCV delay disappeared.
So the only problem was that fetching new photo with openCV while in the same time doing prediction wasn’t smart to do in one thread.