PostProcessing of facial Landmark Model output

Hi Team,

I am using face-landmark detection model. below is the code. I have 2 problems.

  1. How to post process the output to get face keypoint.
  2. Can you guide how to give input image to model, I mean pre-processing part before giving it to model (read-grey-reshape whatever are the steps for the face landmark model).

Code:

import tensorrt as trt
import numpy as np
import pycuda.driver as cuda
import pycuda.autoinit

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

# Load the TensorRT engine
def load_engine(engine_path):
    with open(engine_path, 'rb') as f:
        engine_data = f.read()
    runtime = trt.Runtime(TRT_LOGGER)
    engine = runtime.deserialize_cuda_engine(engine_data)
    return engine

# Perform inference using the TensorRT engine
def inference(engine, input_data):
    # Create an execution context from the engine
    context = engine.create_execution_context()

    # Allocate buffers for input and output
    inputs, outputs, bindings = [], [], []
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size * trt.int32.itemsize
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        device_mem = cuda.mem_alloc(size)
        bindings.append(int(device_mem))
        if engine.binding_is_input(binding):
            inputs.append(device_mem)
        else:
            outputs.append(device_mem)

    # Create a CUDA stream
    stream = cuda.Stream()

    # Copy input data to the GPU
    cuda.memcpy_htod_async(inputs[0], input_data, stream)

    # Run inference
    context.execute_async_v2(bindings, stream.handle, None)

    # Copy output data from the GPU
    output_data = np.empty(engine.get_binding_shape(1), dtype=np.float32)
    cuda.memcpy_dtoh_async(output_data, outputs[0], stream)
    stream.synchronize()


    return output_data

# Example usage
engine_path = './faciallandmarks.etlt_b1_gpu0_int8_1650.engine'
input_data = np.random.random((1, 1, 80, 80)).astype(np.float32)

# Load the engine
engine = load_engine(engine_path)

# Perform inference
output_data = inference(engine, input_data)

print(output_data)

Output:

[[[[-351.4596   -234.18211  -370.86075  ... -278.5809   -360.16522
    -193.51431 ]
   [-339.02295  -244.13142  -504.4302   ... -277.0885   -559.02704
    -130.08746 ]
   [-377.3278   -284.1774   -417.2494   ... -272.36258  -372.72623
    -235.17705 ]
   ...
   [-349.22098  -247.11621  -487.01892  ... -252.58833  -559.02704
    -157.6968  ]
   [-438.39166  -291.515    -436.7749   ... -287.16217  -424.08954
    -196.62347 ]
   [-217.89012  -130.83366  -264.5275   ... -150.98102  -305.81714
    -104.96546 ]]

  [[-191.30396  -227.23907  -255.89716  ... -262.1989   -255.52205
    -176.82489 ]
   [-140.21457  -398.66235  -208.40878  ... -410.06555  -253.79655
    -291.08206 ]
   [-247.34474  -302.33524  -318.23972  ... -278.92865  -269.77606
    -211.85976 ]
   ...
   [-161.74564  -371.5797   -220.9373   ... -463.7807   -273.60214
    -287.10593 ]
   [-272.1017   -291.9823   -299.33438  ... -305.18604  -279.60382
    -175.92464 ]
   [-113.28199  -246.59453  -136.0134   ... -294.60806  -154.01846
    -206.60828 ]]

  [[-182.20886  -201.31818  -208.17303  ... -227.35895  -202.31386
    -149.7728  ]
   [-155.05754  -255.123    -242.67705  ... -271.39847  -267.68384
    -171.21812 ]
   [-213.87903  -247.23418  -257.11435  ... -213.53436  -221.92102
    -192.58687 ]
   ...
   [-170.14586  -218.35956  -212.38551  ... -280.32126  -257.65048
    -179.4516  ]
   [-243.25146  -242.44727  -253.05505  ... -242.2175   -236.12856
    -154.32993 ]
   [-105.541794 -129.28484   -99.0316   ... -172.1755   -131.42937
    -132.8463  ]]

  ...

  [[-367.2959   -291.04105  -378.88754  ... -336.38486  -352.9768
    -250.81123 ]
   [-420.2538   -283.42694  -573.3317   ... -296.72324  -617.31177
    -157.85078 ]
   [-350.5903   -399.45703  -445.4827   ... -417.29907  -380.70584
    -372.2962  ]
   ...
   [-445.2554   -272.97174  -623.5622   ... -285.1316   -650.1548
    -185.69347 ]
   [-403.20728  -436.1639   -415.0262   ... -454.57416  -385.70615
    -334.90747 ]
   [-255.01604  -173.19267  -301.0417   ... -218.30917  -359.4545
    -138.98595 ]]

  [[-166.39908  -162.6162   -242.39203  ... -172.14523  -267.67514
    -131.92213 ]
   [-197.57199  -449.11008  -167.06947  ... -443.93854  -189.86256
    -285.5361  ]
   [-255.94339  -233.53337  -302.4394   ... -211.69797  -315.89496
    -148.87329 ]
   ...
   [-219.4074   -439.3895   -185.16986  ... -504.8478   -190.82025
    -286.4938  ]
   [-267.96243  -234.73048  -288.9838   ... -226.39856  -273.6607
    -112.19366 ]
   [-146.28752  -257.3799   -154.61946  ... -283.18976  -174.58736
    -188.95276 ]]

  [[-173.85258  -204.53004  -184.60422  ... -225.9503   -176.46785
    -212.2098  ]
   [-267.54547  -350.77728  -337.70096  ... -304.9894   -350.8188
    -244.21564 ]
   [-140.18623  -359.78543  -171.77698  ... -418.40054  -236.5359
    -321.88483 ]
   ...
   [-303.41196  -368.2539   -428.9031   ... -368.2954   -406.86017
    -285.2296  ]
   [-163.80664  -371.7824   -186.05714  ... -441.5228   -167.33517
    -292.82635 ]
   [-197.47298  -245.29495  -284.52393  ... -257.7901   -284.68997
    -171.07127 ]]]]

Thanks.

You can take a look at Cannot infer with fpenet with TensorRT8.0.

I tried it but getting the same output Cannot infer with fpenet with TensorRT8.0 - #11 by chuongvodoi95

All output points are drawing on the image corner. Getting wrong facial landmark.

I am able to run the facial landmark model inside deepstream application and getting correct results but I want it in python script.

Please suggest how we can post process and get landmarks correctly using python script.

Thanks.

For deployable_v1.0 model, it is compatible with old version docker nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3.

If you are using latest docker, please download deployable_v3.0 model.
wget ‘https://api.ngc.nvidia.com/v2/models/nvidia/tao/fpenet/versions/deployable_v3.0/files/model.etlt

Then, change below to

        binding_to_type = {
            #"input_face_images:0": np.float32,
            #"softargmax/strided_slice:0": np.float32,
            #"softargmax/strided_slice_1:0": np.float32
            "input_face_images": np.float32,
            "conv_keypoints_m80": np.float32,
            "softargmax": np.float32,
            "softargmax:1": np.float32
        }

And also

            # landmarks, probs = self._do_inference(
            _, landmarks, probs = self._do_inference(

Thanks @Morganh

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.