I am using python to run the face recognition using Jetson Nano Evaluation board Jetpack 4.4
I already compiled the dlib with CUDA enabled and installed.
When I try to profile an in-built CNN face recognition model, it takes around 500 ms which I feel very high for the powerful GPU for a simple 720 p image.
Would you mind to check the GPU utilization with tegrastats first?
To reach an optimal performance, it’s expected to see ~99% GPU utilization.
$ sudo tegrastats
If I remember correctly, the built-in inference in OpenCV is a CPU implementation rather than GPU.
You will need to build it with extra source (opencv_contrib) to get the GPU support.
Here is a tutorial of building OpenCV from source for your reference:
hi techguyz:
1:just @AastaLLL said ,make sure GPU is utilized.
2:for fully speedup your model inference time , you should use tensorRT to optimize your model to create the TRT engine model which is be preferred on running on GPU
hi techguyz:
this face recognize model (facenet) seems come from torch(t7 format),so there are some more work for you if you want to speed up it.
1:try transfer this t7 file to ONNX, TensorRT support ONNX, here is a reference guide, but I am not sure whether it can work correctly
2:using TensorRT transfer ONNX file to tensorrt engine file, also this is not easy step
3:if you could successfully get engine file (model optimized via tensorrt), using C++ interface of tensorrt to intergrated into your Opencv code ,loading your engine file ,then inferenced from this engine file
this is guide of C++ interface link of TRT
all of these steps are not so easily done , only way is try it and try it if you want fully using GPU capacity
hi techguyz:
the easy way is found some models that already support TRT, then you just transfer it to TRT engine file, otherwise you should speedup model with TensorRT by yourself