I am an ophthalmologist, and wrote a code in for python for investigating eye movements.
At my desktop:
The code (python, opencv 4.3 with cuda and cudnn, dlib 19 and tiny yolov3),
works at my computer(i5-6600, 3.5GHZ cpu, 32 Gb ram and 1080ti gtx)
using the USB 4k camera and at these average speed:
at 3840-2160 resolution:
-when only dlib works : 0.085 sec (about 12 FPS)
-when tiny yolo v3 is added for both eyes seperately : 0.22 sec (about 4.5 FPS)
at 1920x1080 resolution:
-when only dlib works : 0.036 sec (about 6.7 FPS)
-when all works (tiny yolo v3 for both eyes separately : 0.17 sec (about 5.8 FPS)
It seems that 5 or even 4 FPS is ok for my work.
At Jetson Xavier:
Now I take my Jetson Xavier (MAXN), same camera with the same code. OpenCV 4.3 build from scratch with cuda and dnn, and dlib_use_cuda true and gst pipeline as:
gst_url=“v4l2src device=/dev/video0 do-timestamp=false ! image/jpeg, width=3840, height=2160, framerate=30/1 ! jpegdec ! videoconvert ! appsink drop=True max-lateness= 1 enable-last-sample =True max-buffers=1”
or
gst_url=“v4l2src device=/dev/video0 ! video/x-raw,framerate=30/1! videoscale ! videoconvert ! appsink”
or
gst_url=“v4l2src device=/dev/video0 ! jpegdec ! videoconvert ! appsink”. without significant difference:
at 3840-2160 resolution:
-when only dlib works : 0.068 sec (about 14.7 FPS)
-when tiny yolo v3 is added for both eyes separately : 0.32 sec (about 3 FPS)
at 1920x1080 resolution:
-when only dlib works : 0.029 sec (about 6.7 FPS)
-when all works (tiny yolo v3 for both eyes separately : 0.27 sec (about 3.7 FPS)
Additionally I have a serious camera lag problem. I see the fps good, but there is a lag in camera : appsink drop=True max-lateness= 1 enable-last-sample =True max-buffers=1 does not have a significant effect, cap.set(cv2.CAP_PROP_FPS,5 has minimal effect. Threading in OpenCV capture had no significant effect too.
Options:
1- Python and OpenCV is amateur. Nvidia is a more professional company. As a doctor you should go and do your job and leave this to professional hands, let the code be written from scratch with c+, gstreamer tensorrt and more fancy nvidia staff.
2- Change your camera. A Csi camera would work nvarguscamera (instead of v42l) and you might easily apply the NVMM so that you can use GPU and all would be faster without your buffer lag. Of course you need a csi carrier board. Luckily there are cheap csi camera carrier boards for single camera.
3-Optimizing the pipeline in gst streamer is enough. Even beginning with v42l, you can lead memory to gpu and work faster in jetson family embedded
4-Change opencv settings and opencv code… or build it with additional flags (additional to cuda,cudnn, v42l etc add openGL support or something else)
5- Try to work for a faster yolo, that is the bottleneck. Learn nvidia stuff such as deepstream sdk, tensor rt. Besides, this is eventually a Nvidia forum, you may not ask other stuff, such as gstreamer or opencv, as there are legal issues?
6- You forget the main thing…Which is…
Which road should I go ?? I would appreciate your help. Best regards.