Next step to get faster with Xavier? USB V4L2 camera,opencv 4 cuda build, Dlib, yolo

I am an ophthalmologist, and wrote a code in for python for investigating eye movements.

At my desktop:

The code (python, opencv 4.3 with cuda and cudnn, dlib 19 and tiny yolov3),
works at my computer(i5-6600, 3.5GHZ cpu, 32 Gb ram and 1080ti gtx)
using the USB 4k camera and at these average speed:

at 3840-2160 resolution:
-when only dlib works : 0.085 sec (about 12 FPS)
-when tiny yolo v3 is added for both eyes seperately : 0.22 sec (about 4.5 FPS)

at 1920x1080 resolution:
-when only dlib works : 0.036 sec (about 6.7 FPS)
-when all works (tiny yolo v3 for both eyes separately : 0.17 sec (about 5.8 FPS)

It seems that 5 or even 4 FPS is ok for my work.

At Jetson Xavier:

Now I take my Jetson Xavier (MAXN), same camera with the same code. OpenCV 4.3 build from scratch with cuda and dnn, and dlib_use_cuda true and gst pipeline as:
gst_url=“v4l2src device=/dev/video0 do-timestamp=false ! image/jpeg, width=3840, height=2160, framerate=30/1 ! jpegdec ! videoconvert ! appsink drop=True max-lateness= 1 enable-last-sample =True max-buffers=1”
or
gst_url=“v4l2src device=/dev/video0 ! video/x-raw,framerate=30/1! videoscale ! videoconvert ! appsink”
or
gst_url=“v4l2src device=/dev/video0 ! jpegdec ! videoconvert ! appsink”. without significant difference:

at 3840-2160 resolution:
-when only dlib works : 0.068 sec (about 14.7 FPS)
-when tiny yolo v3 is added for both eyes separately : 0.32 sec (about 3 FPS)

at 1920x1080 resolution:
-when only dlib works : 0.029 sec (about 6.7 FPS)
-when all works (tiny yolo v3 for both eyes separately : 0.27 sec (about 3.7 FPS)

Additionally I have a serious camera lag problem. I see the fps good, but there is a lag in camera : appsink drop=True max-lateness= 1 enable-last-sample =True max-buffers=1 does not have a significant effect, cap.set(cv2.CAP_PROP_FPS,5 has minimal effect. Threading in OpenCV capture had no significant effect too.

Options:

1- Python and OpenCV is amateur. Nvidia is a more professional company. As a doctor you should go and do your job and leave this to professional hands, let the code be written from scratch with c+, gstreamer tensorrt and more fancy nvidia staff.

2- Change your camera. A Csi camera would work nvarguscamera (instead of v42l) and you might easily apply the NVMM so that you can use GPU and all would be faster without your buffer lag. Of course you need a csi carrier board. Luckily there are cheap csi camera carrier boards for single camera.

3-Optimizing the pipeline in gst streamer is enough. Even beginning with v42l, you can lead memory to gpu and work faster in jetson family embedded

4-Change opencv settings and opencv code… or build it with additional flags (additional to cuda,cudnn, v42l etc add openGL support or something else)

5- Try to work for a faster yolo, that is the bottleneck. Learn nvidia stuff such as deepstream sdk, tensor rt. Besides, this is eventually a Nvidia forum, you may not ask other stuff, such as gstreamer or opencv, as there are legal issues?

6- You forget the main thing…Which is…

Which road should I go ?? I would appreciate your help. Best regards.

You’re already doing the fancy stuff my friend.

Blockquote
gst_url=“v4l2src device=/dev/video0 do-timestamp=false ! image/jpeg, width=3840, height=2160, framerate=30/1 ! jpegdec ! videoconvert ! appsink drop=True max-lateness= 1 enable-last-sample =True max-buffers=1”
or
gst_url=“v4l2src device=/dev/video0 ! video/x-raw,framerate=30/1! videoscale ! videoconvert ! appsink”
or
gst_url=“v4l2src device=/dev/video0 ! jpegdec ! videoconvert ! appsink”. without significant difference:

^^ You are using Gstreamer above ^^

I recommend you do two things on the xavier.

  1. change your command line to this

gst_url=“v4l2src device=/dev/video0 ! video/x-raw(memory:NVMM), width=1280, height=720,format=I420, framerate=30/1 ! nvvideoconv flip-method=0 ! video/x-raw, format=BGRx ! videoconvert ! video/x-raw, format=BGR ! appsink sync=false drop=true max-buffers=10"

It should work i do not have a xavier on hand and i’m not sure what your processing looks like so if it requires jpegs as an input (it shouldnt its slower that way) then you’ll want to put nvjpegdec in the pipeline before nvvideoconvert.

  1. Go have a looksy at the deepstream sdk, download it and use the yolo plugin. To get the most out of your jetson xavier this is what i recommend. The sdk is user friendly and has alot of community support, plus you seem informed enough to take on the more “advanced” nvidia stuff.

After you get the yolo-plugin working with your camera-setup you can add in dlib via an appsrc/appsink pipeline or (more advanced route) step out of the ol’ comfort zone and write a custom classifier using the gst-dsexample plugin (provided in the deepstream sdk)…

If you go the more advanced route you’ll have excellent performance and on the xavier you would easily be able to reach 30fps.

Anyways best of luck to you my friend, don’t give up.

You’re an inspiration to us all for being an ophthalmologist and a developer at the same time, it really is impressive.

1 Like

Option #3

Stand on the shoulders of giants

i’m not sure if you’ve seen this yet or if it’s like anything you are doing but here you go…

1 Like

Thank you for the suggestions.

gst_url=“v4l2src device=/dev/video0 ! video/x-raw(memory:NVMM), width=1280, height=720,format=I420, framerate=30/1 ! nvvideoconv flip-method=0 ! video/x-raw, format=BGRx ! videoconvert ! video/x-raw, format=BGR ! appsink sync=false drop=true max-buffers=10"

No this (or some varieates of this) do not work with my ELP IMX317 cam.

  1. Thanks again for the second suggestion + motivating and thank NVIDIA for the latest 4.4 Jetson SDK pack with deepstream 5.0 supporting python . At least I can see that the camera can work without lag.
    Though I could not extract the working gst-pipeline that the deepstream-python was using… Would be nice if it could print the pipeline:)

The deepstream-python examples work, but can I put my own tiny_yolo.cfg and . weight into deepstream? I guess I must convert them to onnx and then to tensorRT. Is that right? Can I use my yolo weights in deepstream. How can I then integrate this into my python code? Any way to receive image with something like image = cap.read() so that I can forward it to my gui with pyqt5??
Sorry for my questions, but these might serve for those enthiusiasts like me trying to integrate python codes into jetsons.

Thanks for option 3 :), I have seen that great project. But I only need the camera pipeline-dlib-my_trained_yolo.weight work as it should,as the code is already working perfectly in a desktop and already get the exact results that I want. So somehow I have to integrate my python code into jetson, if possible. Is it in the end possible without using opencv but python?

Sorry for the late reply,

I will take some time later and try to get a parse launch string put together that will work for you.

You should be able to just slap your own yolo-tiny weights and tiny-yolo.cfg into the config file under the object_detectorYolo directory in the DS5.0 repository. It should run just fine with no issues. You can convert your custom yolo model to onnx then to trt if you want to see a slight performance boost. However since you are only using 1 camera, it probably is not needed.

To put the videofeed coming out of DS5.0 into your own gui should be rather straight forward. I can walk you through it. If you can share with me part of you gui i can show you exactly what to do and where to place everything.

Essentially you’ll be replacing the nvveglssink in deepstream-app with a appsink, but there are some things that we will need to do to pull the buffers from appsink correctly. Thats what ill be showing you after you share the pyqt5 project.

1 Like