DeepStream 5.0 nvinferserver how to use upstream tensor meta as a model input

• Hardware Platform: Telsa T4
• DeepStream Version 5.0
• TensorRT V7.0
• NVIDIA GPU Driver Version 450.57

I have a face alignment custom model deployed successfully to Triton Inference Server, with 2 inputs:

  1. a 112x112x3 face image
  2. a 5 point landmark of that face image

The output of this model is an aligned face image.

I’m trying to deploy this custom model to the nvinferserver of DeepStream 5 with the upstream element is a primary face detection with the landmarks model.

The problem is I don’t know how to pass face landmarks (in form of NvInferTensorMeta from the upstream face detection model) as a second input for this Triton custom model.

The Gst-nvinferserver File Configuration Specifications seem not to mention any information of how to mapping upstream tensor meta with Triton model’s inputs.

Please give me advice. Thanks.

Hi

I think the way to go is customizing nvinfer so that the primary element adds the landmarks as NvDsUserMeta to the Object meta and then extract the meta and parse it in the face alignment model. I haven’t used the user meta but it seems it was added for cases like this one.

Thank you for your answer, but this is not a solution I’m looking for.

I had a high hope since DeepStream 5.0 Developer Preview announced to integrate Triton Inference Server into DeepStream because it makes DeepStream so much more flexible.

But it seem that nvinferserver element disappointed my expectation when it only support Detection, Classification and Segmentation. There are so many type of machine learning model that doesn’t fit into these 3 types of problems.
I still hope to get a feedback from Nvidia.

So the face alignment custom model need 2 input layer? Or what’s the use of the landmark, is it independent for the model or not?

Yes the face alignment custom model need 2 input layers, 1 for face image and 1 for corresponding face landmark. Each face has its own face landmark.
The face landmark is one of the outputs of upstream face detection model, like in retinaface model.

OK, currently nvinfer/nvinferserver don’t support model which has >=2 input layers and also we cannot support customizing the preprocess like postprocess, we may add the support in later release.
One solution here:
To combine your 2 models(face detection and face alignment) in one inferserver plugin and add a triton custom bakend model between the 2 models to handle the postprocess and preprocess, then add an ensembled model to connect the 3 models, the pipeline as following:
modelA(face detection)->custom triton backend(postprocess for A+ preprocess for B)-> modelB(face alignment)
For ensemble model, refer https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/models_and_schedulers.html#ensemble-models

Thank you for your solution.

I hope in the next release of DeepStream, we will have a nvinferserver element with a CUSTOM mode that support multiple NvInferTensorMeta as inputs and map these tensors corresponding to Triton model’s inputs.
There are many machine learning models that just doesn’t operate on PROCESS_MODE_FULL_FRAME or PROCESS_MODE_CLIP_OBJECTS mode of images, for example action recognition model operate on the time series of human skeleton, a graph structure.

Supporting this feature will make DeepStream so much more flexible, widely extending the possible usage of the SDk.
I accept this as an workaround solution.