How to do inference with fpenet_fp32.trt

Hi,

I am using TLT v3 and fetched the fpenet model using

ngc registry model download-version nvidia/tlt_fpenet:trainable_v1.0

Then, I converted the fpenet.tlt model to model.etlt using fpenet export. After that, using the tlt-convert tool, I created a .trt file in order to use that in a Python application.

Here is the tlt-convert command I used:

tlt-converter fpenet.etlt -k nvidia_tlt -p input_face_images:0,1x1x80x80,1x1x80x80,1x1x80x80 -b 1 -t fp32 -e fpenet_b1_fp32.trt

The problem I am facing is that the inference output is not what it is supposed to be; i.e., when compared with the output of fpenet inference, it is not even close. I suspect that either the conversion step is causing the issue, or my preprocessing is not what it should be.

Here is the inference code I am using:

test.py (4.8 KB)

and I run it with this command: python3 test.py --input p.jpg

p2

The output is this:

landmarks

However, the output that the fpenet inference command creates by following fpenet.ipynb is this:

You can clearly see that the first output is not sensible at all.

Could you please help me find the problem in the process?
Thanks for your help.

2 Likes

For fpenet in TLT, by default, there are only two ways for inference. Facial Landmarks Estimation — Transfer Learning Toolkit 3.0 documentation and Facial Landmarks Estimation — Transfer Learning Toolkit 3.0 documentation

I already have read the fpenet documentation from the TLT 3.0 documentation. However, what I need is to infer in Python code. fpenet inference or TLT CV inference Pipeline do not fit my requirements as I need to perform face alignment using the facial landmarks in a real-time face recognition pipeline that is built in Python. Note that I created the .trt engine file from your previous answer.

My problem is that the inference output from the engine file significantly differs from the model.tlt output and is not accurate. I appreciate it if you could help me solve this or find out what the problem is.

Thanks.

Please crop the face bounding box to be a square bbox. Then run your code to do inference against this square bbox. It will get the same result as fpenet inference.

Hi,

I tried my code with a square input image, but the result is not the same as the fpenet inference output.

Here is the output I get after feeding a 400x400 image to my code:
landmarks

I also tried a couple of slightly different bounding boxes with different sizes, but the output is not desirable.

Could you please share an input image and/or possible edits you applied to my code?

Thanks.

test

Run following command against above png file.
$ python3 test.py --input test.png

or test_1

$ python3 test.py --input test_1.png

This is the output of running test.py --input test.png. Still, it is not satisfactory.

landmarks

Do you have any idea about it?

On my side, it works well.
landmarks

My step:
# tlt-converter pretrained_models/public/model.etlt -k nvidia_tlt -p input_face_images:0,1x1x80x80,1x1x80x80,1x1x80x80 -b 1 -t fp32 -e fpenet_b1_fp32.trt
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[INFO] Detected input dimensions from the model: (-1, 1, 80, 80)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Using optimization profile opt shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Using optimization profile max shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Detected 1 inputs and 2 output network tensors.

Modify your code to

fpenet_obj = FpeNet(‘fpenet_b1_fp32.trt’)

# python3 test.py --input test.png

I try to do same as Aref, but also got unsatisfied result, here are the config ,steps ans result. I used your recommended resized picture as input

±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 11.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:21:01.0 Off | 0 |
| N/A 52C P0 28W / 70W | 5457MiB / 15079MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
±----------------------------------------------------------------------------+

tlt-converter tlt_fpenet_vdeployable_v1.0/model.etlt -k nvidia_tlt -p input_face_images:0,1x1x80x80,1x1x80x80,1x1x80x80 -b 1 -t fp32 -e fpenet_b1_fp32.trt
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[INFO] Detected input dimensions from the model: (-1, 1, 80, 80)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Using optimization profile opt shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Using optimization profile max shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Detected 1 inputs and 2 output network tensors.

landmarks
landmarks

1 Like

Please retry with below modification.

img_np = img_np.astype(np.float32) / 255

to

img_np = img_np.astype(np.float32)

1 Like

This solved the issue. Thanks!

Did you use this on jetson NX? or PC? What is the approximate fps of the 68-point facial landmarks? Is it faster than Dlib model? thanks!

Hi @154221990 ,

I used this on an x86 PC with an Nvidia GTX 1080 Ti. The approximate framerate is about 1500fps. Yes, it is much faster than the dlib model. I got about 1000fps on my CPU using the dlib’s 5-points landmark detector. I did not test the dlib’s 68-points landmark detector tough.

Also, refer to NVIDIA NGC for more info and the official benchmarks.

1 Like

many thanks for your reply.

Hi @Aref and @Morganh,
I successfully created .engine file with the given command
./tlt-converter /opt/nvidia/deepstream/deepstream-5.1/samples/models/tlt/facial_landmarks/model.etlt -k nvidia_tlt -p input_face_images:0,1x1x80x80,1x1x80x80,1x1x80x80 -b 1 -e facial_landmark.engine -t fp16
and then I used test.py mentioned in question and as mentioned by @Morganh replaced older code with this
img_np = img_np.astype(np.float32),
but I am still getting the exact same issue. Please find attached test images that I used.
@Aref have you done any more modification in the file if so can you please tell me what have you done?
I am working on Jetson Nano 2GB and I have also tried with fp32 engine file.

Thanks in advance.
landmarks
frame_288

See my comments above, please check if you can run inference well against the default png file.

Yes it is working on default PNG, can you please point out what I am missing?

@devesh
As mentioned above, please crop the face bounding box to be a square bbox. Then run your code to do inference against this square bbox.
Currently, the picture you shared is not a square one. Its resolution is 221x240.

Hello,
i’m testing FPEnet model using my system as below:
Jetson Nano 2GB
Jetpack 4.5.1
TensorRT 7.1.3
Cuda 10.2

i’m following the above testing method, but having following error during context.execute_async :
[TensorRT] ERROR: Parameter check failed at: engine.cpp::resolveSlots::1227, condition: allInputDimensionsSpecified(routine)

i use model.etlt from NGC deployable v1.0 + png file from Morganh as above and executing below command :
python3 test.py --input test.png

notes:
i also testing using fpenet trainable model, train it using TLT, exporting the model, and converting the model using tlt-converter. But same result occurred.

what am i missing?