How to do inference with fpenet_fp32.trt

Hi,

I am using TLT v3 and fetched the fpenet model using

ngc registry model download-version nvidia/tlt_fpenet:trainable_v1.0

Then, I converted the fpenet.tlt model to model.etlt using fpenet export. After that, using the tlt-convert tool, I created a .trt file in order to use that in a Python application.

Here is the tlt-convert command I used:

tlt-converter fpenet.etlt -k nvidia_tlt -p input_face_images:0,1x1x80x80,1x1x80x80,1x1x80x80 -b 1 -t fp32 -e fpenet_b1_fp32.trt

The problem I am facing is that the inference output is not what it is supposed to be; i.e., when compared with the output of fpenet inference, it is not even close. I suspect that either the conversion step is causing the issue, or my preprocessing is not what it should be.

Here is the inference code I am using:

test.py (4.8 KB)

and I run it with this command: python3 test.py --input p.jpg

p2

The output is this:

landmarks

However, the output that the fpenet inference command creates by following fpenet.ipynb is this:

You can clearly see that the first output is not sensible at all.

Could you please help me find the problem in the process?
Thanks for your help.

2 Likes

For fpenet in TLT, by default, there are only two ways for inference. Facial Landmarks Estimation — Transfer Learning Toolkit 3.0 documentation and Facial Landmarks Estimation — Transfer Learning Toolkit 3.0 documentation

I already have read the fpenet documentation from the TLT 3.0 documentation. However, what I need is to infer in Python code. fpenet inference or TLT CV inference Pipeline do not fit my requirements as I need to perform face alignment using the facial landmarks in a real-time face recognition pipeline that is built in Python. Note that I created the .trt engine file from your previous answer.

My problem is that the inference output from the engine file significantly differs from the model.tlt output and is not accurate. I appreciate it if you could help me solve this or find out what the problem is.

Thanks.

Please crop the face bounding box to be a square bbox. Then run your code to do inference against this square bbox. It will get the same result as fpenet inference.

Hi,

I tried my code with a square input image, but the result is not the same as the fpenet inference output.

Here is the output I get after feeding a 400x400 image to my code:
landmarks

I also tried a couple of slightly different bounding boxes with different sizes, but the output is not desirable.

Could you please share an input image and/or possible edits you applied to my code?

Thanks.

test

Run following command against above png file.
$ python3 test.py --input test.png

or test_1

$ python3 test.py --input test_1.png

This is the output of running test.py --input test.png. Still, it is not satisfactory.

landmarks

Do you have any idea about it?

On my side, it works well.
landmarks

My step:
# tlt-converter pretrained_models/public/model.etlt -k nvidia_tlt -p input_face_images:0,1x1x80x80,1x1x80x80,1x1x80x80 -b 1 -t fp32 -e fpenet_b1_fp32.trt
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[INFO] Detected input dimensions from the model: (-1, 1, 80, 80)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Using optimization profile opt shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Using optimization profile max shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Detected 1 inputs and 2 output network tensors.

Modify your code to

fpenet_obj = FpeNet(‘fpenet_b1_fp32.trt’)

# python3 test.py --input test.png

I try to do same as Aref, but also got unsatisfied result, here are the config ,steps ans result. I used your recommended resized picture as input

±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 11.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:21:01.0 Off | 0 |
| N/A 52C P0 28W / 70W | 5457MiB / 15079MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
±----------------------------------------------------------------------------+

tlt-converter tlt_fpenet_vdeployable_v1.0/model.etlt -k nvidia_tlt -p input_face_images:0,1x1x80x80,1x1x80x80,1x1x80x80 -b 1 -t fp32 -e fpenet_b1_fp32.trt
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[INFO] Detected input dimensions from the model: (-1, 1, 80, 80)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Using optimization profile opt shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Using optimization profile max shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Detected 1 inputs and 2 output network tensors.

landmarks
landmarks

1 Like

Please retry with below modification.

img_np = img_np.astype(np.float32) / 255

to

img_np = img_np.astype(np.float32)

1 Like

This solved the issue. Thanks!

Did you use this on jetson NX? or PC? What is the approximate fps of the 68-point facial landmarks? Is it faster than Dlib model? thanks!

Hi @154221990 ,

I used this on an x86 PC with an Nvidia GTX 1080 Ti. The approximate framerate is about 1500fps. Yes, it is much faster than the dlib model. I got about 1000fps on my CPU using the dlib’s 5-points landmark detector. I did not test the dlib’s 68-points landmark detector tough.

Also, refer to NVIDIA NGC for more info and the official benchmarks.

1 Like

many thanks for your reply.