Xavier NX opencv and face_recognition very slow and no GPU

telemac73 · May 17, 2021, 11:48am

I run face recognition very slow around 2.4fps (640/480 video stream) and jtop report no GPU use
I compiled opencv and dlib to use CUDA i when run cv2.cuda.getCudaEnabledDeviceCount()/dlib.cuda.get_num_devices() i get :

cv2 version  4.5.2
cv2 Nb Gpu  1
Dlib version  19.22.0
Dlib use cuda  True
Dlib nb Gpu  1

and i use this code :

import face_recognition
import cv2
import os
import pickle 
import time
import dlib
print(cv2.__version__)

Encodings=[]
Names=[]

with open('train.pkl','rb') as f:
    Names=pickle.load(f)
    Encodings=pickle.load(f)
font=cv2.FONT_HERSHEY_SIMPLEX

dispW=640
dispH=480
flip=2
camSet1='nvarguscamerasrc !  video/x-raw(memory:NVMM), width=1280, height=720, format=NV12, 
framerate=60/1 ! nvvidconv flip-method='+str(flip)+' ! video/x-raw, width='+str(dispW)+', height='+str(dispH)+', 
format=BGRx ! videoconvert ! video/x-raw, format=BGR ! appsink'
cam= cv2.VideoCapture(camSet1)

with open('train.pkl','rb') as f:
    Names=pickle.load(f)
    Encodings=pickle.load(f)
font=cv2.FONT_HERSHEY_SIMPLEX

prev_frame_time = 0
new_frame_time = 0

while True:
    prev_frame_time = time.time()
    _,frame=cam.read()

    frameRGB=cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)

    facePositions=face_recognition.face_locations(frameRGB)
    allEncodings=face_recognition.face_encodings(frameRGB,facePositions)
    for (top,right,bottom,left),face_encoding in zip(facePositions,allEncodings): 
        name='Unkown Person'
        matches=face_recognition.compare_faces(Encodings,face_encoding)
        if True in matches:
            first_match_index=matches.index(True)
            name=Names[first_match_index]
        cv2.rectangle(frameRGB,(left,top),(right, bottom),(0,0,255),2)
        cv2.putText(frameRGB,name,(left,top-6),font,.75,(0,0,255),2)
      new_frame_time = time.time()
      fps = 1/(new_frame_time-prev_frame_time)        
      cv2.putText(frameRGB,str(fps),(10,20),font,.75,(0,0,255),2)
     cv2.imshow('Picture',frameRGB)
cv2.moveWindow('Picture',0,0)
if cv2.waitKey(1)==ord('q'):
    break

cam.release()
cv2.destroyAllWindows()

Any advice ?

AastaLLL · May 18, 2021, 5:28am

Hi,

If you have compiled the dlib with CUDA enabled, it should use GPU by default.
A possible cause is that there are some IO-related bottleneck to make GPU idle for a long period.

We are checking this on our environment. Will share more information with you later.

Thanks.

AastaLLL · May 18, 2021, 6:13am

Hi,

To reproduce this issue, could you also share the train.pkl with us?

Thanks.

telemac73 · May 18, 2021, 8:02am

train.pkl (3.3 KB)

I have also forget to specifiy i am on JetPack 4.5.1

AastaLLL · May 27, 2021, 9:13am

Hi,

Confirmed that the 0% GPU utilization can be reproduced in our environment.

We are checking this internally.
Will share more information with you later.

Thanks.

infodev · May 28, 2021, 8:50am

Hi, I think I have the same issue with the DLIB-19.22.0 version, as I mentionned in this thread, with this version the GPU of my Jetson Nano does not seem to be used, even if compiled with CUDA enabled.

telemac73 · May 28, 2021, 11:13am

I have partially fixed my issue, if I go down the capture resolution from 640x480 to 300x300 framte rate got from 2,25 to 12 and gpu seem to be used but less than 50% …

i have aloso change that in my code :

 face_recognition.face_locations(image)

to

face_recognition.face_locations(image, model=“cnn”)

after that fps reach 15

I think that 15 fps on a Xavier Nx especially in 300x300 is quite low especially with the GPU at less than 50% and the CPU too? what is the expected fps ?

AastaLLL · June 8, 2021, 6:51am

Hi,

This depends on the implementation of face_recognition module.

As below, you can see the different algorithm is used if model == "cnn" set.
https://github.com/ageitgey/face_recognition/blob/master/face_recognition/api.py#L108

The underlying detector is cnn_face_detection_model_v1 in dlib library.
However, the implementation is not just pure CUDA inference but several CPU<->GPU buffer copy.

Since the GPU utilization doesn’t reach 99%.
It indicates the imeplementation is bounded by data transfer rather than computation.
https://github.com/davisking/dlib/blob/master/tools/python/src/cnn_face_detector.cpp

Thanks.

Topic		Replies	Views
Very poor Performance with with NVIDIA Jetson Nano 2GB in Face Recognition Jetson Nano python	7	3390	March 28, 2022
Simple accelerated face recognition Jetson Xavier NX opencv , cuda	20	9116	October 18, 2021
Face Recognition Running Slow on Jetpack 4.4 Jetson Nano nvbugs	18	5427	October 15, 2021
Jetson Nano Opencv CNN model Jetson Nano opencv , cudnn	4	1966	October 18, 2021
If the image show and cv2 functions are accelerated by GPU? Jetson Nano	7	1309	October 18, 2021
does opencv_dnn use gpu? Jetson TX2	11	3097	October 18, 2021
Too slow OPENCV with CUDA compiled, why? Jetson Nano opencv	5	4927	October 18, 2021
Is OpenCV really using the GPU for detection? Jetson Nano opencv , cuda , jetson-inference	11	8406	October 15, 2021
Little problem for enabled cuda on dnn module from opencv Jetson Nano cuda	5	2578	October 15, 2021
Integrating the DLib Library on the Code Jetson Nano opencv , python	4	395	November 6, 2023

Xavier NX opencv and face_recognition very slow and no GPU

Related topics