Understanding settings for secondary classifier

dangraf · March 19, 2021, 11:22am

I’m still struggerling with deepstream using python to try out different models. I have some questins that I can’t find the answes to in the deepstream documentation

• Hardware Platform (Jetson / GPU)
Jetson agx xavier
• DeepStream Version
5.1
• JetPack Version (valid for Jetson only)
5.1
**• Issue Type
Questions

I’m trying out different models like "facial landmarks"n(https://ngc.nvidia.com/catalog/models/nvidia:tlt_fpenet)
I’m not able to fill in the configuration file correctly I need to define the input and output names of the network. But I can’t see it’s documented on the page for the mode. How do I find the input/output names for any etlt model since this is a common problem. I’ve seen your recommendations on looking at other configuration files but this does not actually answer the question. In this case, there is no configuration file.
The facial landmark model can output different outputs 68, 80 or 104. How do I define to the which output I would like? Is this set in the configuration file somehow?
When looking in in the deepstream examples for eg secondray classifiers like car-make or car-color. I can’t find which function that is converting the tensor output from the model into the metadata structure that is passed to the pad/sinks. When is this done automatically and when do I need to create my own converter? If I eg use my own boundingbox-model how can I re-use the functions you are using? Do I need to write my own converter function for the facial landmark model or is it done magically as the other examples? How do I scale the coordinates to the image?
The image is normalized, resized and converted to a tensor to input to the primary detector. But for the second classified, Is the image already normalized, that’s why the scaling factor is set to 1 in the examples for eg car-color?
I do not understand how to set the network-type 0: detector, 1 classifier etc…
It’s not a detector since it does not produce bounding boxes and it’s not a classifier either. does 0 mean that the output is a regression problem and if using 1 it’s converting one-hot encoded outpus to classes? What exaclty is this switch doing?
For the primary classifier/Detecotr it’s common to define the input shape of the input tensor (infer-dims=3;160;160) but I can’t find this switch for the examples I’ve fond for the secondary classifier. Does the network detect then input-shapes automatically or when is this switch needed? Is the processing able to both up and down-size an image?
I would like to save the metadata to a file in json format and have been looking at the “gst-nvmsgconv” plugin(Gst-nvmsgconv — DeepStream 6.1.1 Release documentation) . But I can’t understand how to use this to convert it to json. I was expectin that the “payload-type” property would allow me to convert it to eg json since there seems to be different formats. But this property seem to tell how much information that is stored in message. Another question that pops up is that eg “PAYLOAD_DEEPSTREAM” is one setting. but it’s not defined which number that corresponds. i’m guessing it’s 0 or 1, why is this not defined?
Regarding the gst-nvmsgconv, where is the final result stored so I can access it using a pad and save it to file?

Fiona.Chen · March 22, 2021, 1:29am

Firstly, model related questions, please create topic in TLT forum. Latest Intelligent Video Analytics/Transfer Learning Toolkit topics - NVIDIA Developer Forums

The inference plugin gst-nvinfer(Gst-nvinfer — DeepStream 6.3 Release documentation) will convert the model output to metadata. The source code is in /opt/nvidia/deepstream/deepstream/sources/gst-puigins/gst-nvinfer/ and /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer. before you investigate the implementation of deepstream, please make sure you are familiar with gstreamer(https://gstreamer.freedesktop.org/) coding skills. Since deepstream by default can support some specific types models which we define into detector, classifier, segmentation and instance segmentation. Facial landmark is not any one of them, so it may need a lot customization to integrate the model with deepstream. It is a must to read and understanding gst-nvinfer source code at least.
What kind of converter do you want? If you want to use your own bbox model, you need to customize the pre-processing of the nvinfer plugin.

Fiona.Chen · March 22, 2021, 2:09am

Yes. gst-nvinfer will also do the resize, normalization and conversion to adapt to the model input. The factor value is decided by the model, but not deepstream. For the sample car-color model, the normalization factor is 1.

Please refer to Object Detection — Transfer Learning Toolkit 3.0 documentation (nvidia.com)

infer-dims only works with uff model now.

gst-nvmsgconv is a sample and it is open source too. /opt/nvidia/deepstream/deepstream/sources/gst-puigins/gst-nvmsgconv and /opt/nvidia/deepstream/deepstream/sources/libs/nvmsgconv. It also requires you know good gstreamer coding skills before you start with these implementation. E.G. “PAYLOAD_DEEPSTREAM” is a sample which defines some message and format which can be transferred to server. If you go through the code, you will inderstand how it is defined and used to match the massages as defined.

Please refer to the code /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test4.

Deepstream is only a SDK, you need to understand the interfaces and usages with document and source code.

Morganh · March 22, 2021, 3:42am

@dangraf
For question 1, see the reference in How to find the input/output layers names of tlt/etlt model - #8 by Morganh . You can find the command in that shell script.
For question 2, see Transfer Learning Toolkit — Transfer Learning Toolkit 3.0 documentation

dangraf · March 22, 2021, 9:25am

Thanks Morganh, but I don’t think those answers any of my questions. I can’t find any methods for finding the input/output of the etlt model. Do you mean that I should use the tlt-converter? but still I need to know the input names to be able to use that one and the gaze-net has several inputs.

regarding question 2, the documentation does only tell me how to setup the model for training. I’m interested in using the pre trained etlt model. What settings have been used?

Why isn’t this documented on the page for the pre-trained model?

dangraf · March 22, 2021, 9:42am

Thanks for your answers,
I was thinking of your comment “Deepstream is only a SDK, you need to understand the interfaces and usages with document and source code.”

I almost agree with you, but In my view i was hoping that the documentation should be enough to be able to use existing components. For example the “gst-nvmsgconv” why should I read the source-code to change between different message-formats? Why is there no link to the open-source page of that module?
Another example, the “infer-dims” property that only applies for uff-models. Should I find that out in the source code too? I was expecting this to be documented to help me as a user to know which parameters to use.

I guess we have different views of what should be documented… Sorry for my complaints, I appreciate the help you are giving but I am frustrated that it’s so hard to deploy a model using deepstream when it’s so easy in theory

Fiona.Chen · March 22, 2021, 9:57am

If the message we have encapsulated can meet your requirement, you can use the plugin directly.
For most users, the message is customized. Different people and different project will need different messages. We provide the interface to contain these messages and transfer the messages, but the generation of the message should be implemented by the user himself. That is why we provide such example. Or you can just refer to the interface of nvmsgconv (Gst-nvmsgconv — DeepStream 6.3 Release documentation) to implement your own message conversion by just ignoring our code.

Morganh · March 23, 2021, 8:33am

For the input/output name, refer to How to find the input/output layers names of tlt/etlt model - #3 by Morganh
In the doc link I shared with you, please see the table. The num_keypoints support 68, 80 or 104. You can change it in the training spec file.
num_keypoints, Number of facial keypoints, 68, 80, 104

Further question, please create topics in TLT forum.

Topic		Replies	Views
Some question about Deep stream 5 DeepStream SDK	42	1777	October 12, 2021
Mask Obtained from Deepstream are not same as TAO inferecing Output DeepStream SDK tensorrt , gstreamer , deepstream	2	712	October 13, 2022
How to append DeepStream Metadata in Python without using Streammux / nvinfer for parallel branch? DeepStream SDK	21	625	March 12, 2024
Secondary classifiers labels are missing from output DeepStream SDK	4	724	July 22, 2020
Issue with image classification tutorial and testing with deepstream-app TAO Toolkit tensorrt , jetson-inference	34	5736	October 12, 2021
Parsing custom tensorflow model DeepStream SDK	31	563	September 4, 2023
Saving or streaming DeepStream renderer output on aws ec2 instance DeepStream SDK	16	1250	October 12, 2021
Can't configure DeepStream classifier to give the same softmax outputs as the TRT engine it builds DeepStream SDK deepstream , config	24	881	January 4, 2024
Integrating Tao Models (detectnet_v2) into Deepstream SDK TAO Toolkit tao , deepstream , jetson-nano	11	953	March 24, 2023
Error in Deepstream 6.1 DeepStream SDK deepstream	21	478	June 6, 2024

Understanding settings for secondary classifier

Related topics