Opencv2 inbuilt face recognition CNN model around 500 ms (poor performance) in Jetson Nano.. Help!

techguyz · October 29, 2020, 7:47pm

Hello Experts,

CC: @Honey_Patouceul @DaneLLL @Amycao @kayccc @icornejo.a @AastaLLL @dusty_nv @forumuser @Jeffli @fpsychosis

I am using python to run the face recognition using Jetson Nano Evaluation board Jetpack 4.4

I already compiled the dlib with CUDA enabled and installed.

When I try to profile an in-built CNN face recognition model, it takes around 500 ms which I feel very high for the powerful GPU for a simple 720 p image.

AastaLLL · October 30, 2020, 3:58am

Hi,

Would you mind to check the GPU utilization with tegrastats first?
To reach an optimal performance, it’s expected to see ~99% GPU utilization.

$ sudo tegrastats

If I remember correctly, the built-in inference in OpenCV is a CPU implementation rather than GPU.
You will need to build it with extra source (opencv_contrib) to get the GPU support.

Here is a tutorial of building OpenCV from source for your reference:

Thanks.

Jeffli · October 30, 2020, 6:27am

hi techguyz:
1:just @AastaLLL said ,make sure GPU is utilized.
2:for fully speedup your model inference time , you should use tensorRT to optimize your model to create the TRT engine model which is be preferred on running on GPU

techguyz · October 30, 2020, 8:37am

Hi @AastaLLL

Pls find the tegrastats logs observed. It seems GPU is already utilized fully.

60/6205 POM_5V_GPU 2432/2655 POM_5V_CPU 1144/1200
RAM 2731/3956MB (lfb 2x4MB) SWAP 530/1978MB (cached 38MB) CPU [63%@1479,15%@1479,6%@1479,42%@1479] EMC_FREQ 0% GR3D_FREQ 99% PLL@47.5C CPU@51C PMIC@100C GPU@51.5C AO@55.5C thermal@51C POM_5V_IN 6403/6244 POM_5V_GPU 3106/2745 POM_5V_CPU 1107/1182
RAM 2734/3956MB (lfb 2x4MB) SWAP 530/1978MB (cached 38MB) CPU [45%@1479,15%@1479,27%@1479,48%@1479] EMC_FREQ 0% GR3D_FREQ 99% PLL@47.5C CPU@51C PMIC@100C GPU@51C AO@56C thermal@51.25C POM_5V_IN 6046/6211 POM_5V_GPU 2499/2704 POM_5V_CPU 1144/1175
RAM 2731/3956MB (lfb 2x4MB) SWAP 530/1978MB (cached 38MB) CPU [34%@1479,29%@1479,14%@1479,44%@1479] EMC_FREQ 0% GR3D_FREQ 99% PLL@47.5C CPU@51C PMIC@100C GPU@51.5C AO@55.5C thermal@51.25C POM_5V_IN 6368/6234 POM_5V_GPU 2856/2726 POM_5V_CPU 1073/1161
RAM 2732/3956MB (lfb 2x4MB) SWAP 530/1978MB (cached 38MB) CPU [50%@1479,37%@1479,27%@1479,34%@1479] EMC_FREQ 0% GR3D_FREQ 99% PLL@47.5C CPU@51.5C PMIC@100C GPU@51.5C AO@56C thermal@51.5C POM_5V_IN 6200/6229 POM_5V_GPU 2683/2720 POM_5V_CPU 1218/1168
RAM 2732/3956MB (lfb 2x4MB) SWAP 530/1978MB (cached 38MB) CPU [59%@1479,15%@1479,43%@1479,12%@1479] EMC_FREQ 0% GR3D_FREQ 99% PLL@47.5C CPU@51C PMIC@100C GPU@51.5C AO@56C thermal@51.25C POM_5V_IN 6403/6249 POM_5V_GPU 3106/2763 POM_5V_CPU 1071/1157
RAM 2732/3956MB (lfb 2x4MB) SWAP 530/1978MB (cached 38MB) CPU [52%@1479,18%@1479,37%@1479,20%@1479] EMC_FREQ 0% GR3D_FREQ 99% PLL@47.5C CPU@51.5C PMIC@100C GPU@51C AO@56C thermal@52.25C POM_5V_IN 6368/6260 POM_5V_GPU 2964/2783 POM_5V_CPU 1001/1141
RAM 2734/3956MB (lfb 2x4MB) SWAP 530/1978MB (cached 38MB) CPU [50%@1479,13%@1479,32%@1479,43%@1479] EMC_FREQ 0% GR3D_FREQ 99% PLL@48C CPU@51C PMIC@100C GPU@52C AO@56C thermal@51.5C POM_5V_IN 6153/6

techguyz · October 30, 2020, 8:41am

Hello @Jeffli

As the model comes along with opencv is there any guide on how to specifically load, optimize and apply back to opencv

techguyz · October 30, 2020, 8:42am

As I initially installed dlib without CUDA support, then while enabling the CNN model leads to hang of device.

Jeffli · October 31, 2020, 1:50am

hi techguyz:
this face recognize model (facenet) seems come from torch(t7 format),so there are some more work for you if you want to speed up it.
1:try transfer this t7 file to ONNX, TensorRT support ONNX, here is a reference guide, but I am not sure whether it can work correctly

2:using TensorRT transfer ONNX file to tensorrt engine file, also this is not easy step
3:if you could successfully get engine file (model optimized via tensorrt), using C++ interface of tensorrt to intergrated into your Opencv code ,loading your engine file ,then inferenced from this engine file
this is guide of C++ interface link of TRT

all of these steps are not so easily done , only way is try it and try it if you want fully using GPU capacity

techguyz · November 2, 2020, 7:51pm

Hi @Jeffli

This looks bit complex and my torch to ONNX didnt work as expected.

Any other quick hack, I am ok to use new model as long as good performance.

Jeffli · November 11, 2020, 11:22am

hi techguyz：
the easy way is found some models that already support TRT, then you just transfer it to TRT engine file, otherwise you should speedup model with TensorRT by yourself

Topic		Replies	Views
Face detection on Jetson Nano Jetson Nano	3	1767	October 18, 2021
Opencv cnn model benchmark Jetson Nano opencv	6	962	October 18, 2021
Very poor Performance with with NVIDIA Jetson Nano 2GB in Face Recognition Jetson Nano python	7	3638	March 28, 2022
Face Detection/Recognition In Python that gets 5FPS or more? Jetson Nano jetson-inference	7	1741	June 8, 2023
Benchmark on best model for face recognition and comparision Jetson Nano jetson-inference	4	1041	October 18, 2021
Opencv Face Detection Poor Performance with jetson nano Jetson Nano opencv	51	15065	October 14, 2021
Jetson Nano Opencv CNN model Jetson Nano opencv , cudnn	4	2048	October 18, 2021
Home surveillance with face recognition (on Jetson Nano) Jetson Projects	6	2360	June 10, 2020
Jetson Nano faster for object recognition with GPU Jetson Nano jetson-inference , nano2gb	5	1151	December 15, 2021
Face Recognition on jetson orin nano 8 GB Jetson Orin Nano jetson-inference	3	252	November 18, 2024

Opencv2 inbuilt face recognition CNN model around 500 ms (poor performance) in Jetson Nano.. Help!

Related topics