Xavier NX opencv and face_recognition very slow and no GPU

I run face recognition very slow around 2.4fps (640/480 video stream) and jtop report no GPU use
I compiled opencv and dlib to use CUDA i when run cv2.cuda.getCudaEnabledDeviceCount()/dlib.cuda.get_num_devices() i get :

cv2 version  4.5.2
cv2 Nb Gpu  1
Dlib version  19.22.0
Dlib use cuda  True
Dlib nb Gpu  1

and i use this code :

import face_recognition
import cv2
import os
import pickle 
import time
import dlib
print(cv2.__version__)

Encodings=[]
Names=[]

with open('train.pkl','rb') as f:
    Names=pickle.load(f)
    Encodings=pickle.load(f)
font=cv2.FONT_HERSHEY_SIMPLEX

dispW=640
dispH=480
flip=2
camSet1='nvarguscamerasrc !  video/x-raw(memory:NVMM), width=1280, height=720, format=NV12, 
framerate=60/1 ! nvvidconv flip-method='+str(flip)+' ! video/x-raw, width='+str(dispW)+', height='+str(dispH)+', 
format=BGRx ! videoconvert ! video/x-raw, format=BGR ! appsink'
cam= cv2.VideoCapture(camSet1)

with open('train.pkl','rb') as f:
    Names=pickle.load(f)
    Encodings=pickle.load(f)
font=cv2.FONT_HERSHEY_SIMPLEX

prev_frame_time = 0
new_frame_time = 0

while True:
    prev_frame_time = time.time()
    _,frame=cam.read()

    frameRGB=cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)

    facePositions=face_recognition.face_locations(frameRGB)
    allEncodings=face_recognition.face_encodings(frameRGB,facePositions)
    for (top,right,bottom,left),face_encoding in zip(facePositions,allEncodings): 
        name='Unkown Person'
        matches=face_recognition.compare_faces(Encodings,face_encoding)
        if True in matches:
            first_match_index=matches.index(True)
            name=Names[first_match_index]
        cv2.rectangle(frameRGB,(left,top),(right, bottom),(0,0,255),2)
        cv2.putText(frameRGB,name,(left,top-6),font,.75,(0,0,255),2)
      new_frame_time = time.time()
      fps = 1/(new_frame_time-prev_frame_time)        
      cv2.putText(frameRGB,str(fps),(10,20),font,.75,(0,0,255),2)
     cv2.imshow('Picture',frameRGB)
cv2.moveWindow('Picture',0,0)
if cv2.waitKey(1)==ord('q'):
    break

cam.release()
cv2.destroyAllWindows()

Any advice ?

Hi,

If you have compiled the dlib with CUDA enabled, it should use GPU by default.
A possible cause is that there are some IO-related bottleneck to make GPU idle for a long period.

We are checking this on our environment. Will share more information with you later.

Thanks.

Hi,

To reproduce this issue, could you also share the train.pkl with us?

Thanks.

train.pkl (3.3 KB)

I have also forget to specifiy i am on JetPack 4.5.1

Hi,

Confirmed that the 0% GPU utilization can be reproduced in our environment.

We are checking this internally.
Will share more information with you later.

Thanks.

Hi, I think I have the same issue with the DLIB-19.22.0 version, as I mentionned in this thread, with this version the GPU of my Jetson Nano does not seem to be used, even if compiled with CUDA enabled.

I have partially fixed my issue, if I go down the capture resolution from 640x480 to 300x300 framte rate got from 2,25 to 12 and gpu seem to be used but less than 50% …

i have aloso change that in my code :

 face_recognition.face_locations(image)

to

face_recognition.face_locations(image, model=“cnn”)

after that fps reach 15

I think that 15 fps on a Xavier Nx especially in 300x300 is quite low especially with the GPU at less than 50% and the CPU too? what is the expected fps ?

Hi,

This depends on the implementation of face_recognition module.

As below, you can see the different algorithm is used if model == "cnn" set.
https://github.com/ageitgey/face_recognition/blob/master/face_recognition/api.py#L108

The underlying detector is cnn_face_detection_model_v1 in dlib library.
However, the implementation is not just pure CUDA inference but several CPU<->GPU buffer copy.

Since the GPU utilization doesn’t reach 99%.
It indicates the imeplementation is bounded by data transfer rather than computation.
https://github.com/davisking/dlib/blob/master/tools/python/src/cnn_face_detector.cpp

Thanks.