How to output probability of each class in audio-transform?

Hi,

In DeepStream documentation 6.4, it’s not clear how to output probability of each predicted class. It is possible to do so? Thanks for any comments.

@Fiona.Chen Hi Fiona.Chen,

Is it possible to output probabilities in audio-transform like

0 INPUT kFLOAT input 1x512x128
1 OUTPUT kFLOAT output 2

Runtime commands:
h: Print this help
q: Quit

p: Pause
r: Resume

INFO: <bus_callback:137>: Pipeline ready

INFO: <bus_callback:123>: Pipeline running

max_fps_dur 8.33333e+06 min_fps_dur 2e+08
label:[ other ] source_id:[0] probabaility: [0.538]
label:[ other ] source_id:[0] probabaility: [0.61]
label:[ other ] source_id:[0] probabaility: [0.518]
label:[ other ] source_id:[0] probabaility: [0.548]
label:[ cry ] source_id:[0] probabaility: [0.724]

All I know is that I could set a classifier-threshold to get the class with maximum probability.

Thanks.

Are you talking about the nvinferaudio Gst-nvinferaudio — DeepStream documentation 6.4 documentation property “audio-transform”?
Please elaborate on your model in details. Why do you want to use nvinferaudio and “audio-transform” with this model? What are the input and output of the model?

We deployed “Efficient Pre-Trained CNNs for Audio Pattern Recognition ” on deepstream. See how-audio-transform-works in details.

Model: pre-trained audio transformer
Input: a waveform audio
Output: label of class with the max. prob. Like

max_fps_dur 8.33333e+06 min_fps_dur 2e+08
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input           1x512x128       
1   OUTPUT kFLOAT output          2               

...
Runtime commands:
	h: Print this help
	q: Quit

	p: Pause
	r: Resume

** INFO: <bus_callback:137>: Pipeline ready

** INFO: <bus_callback:123>: Pipeline running

max_fps_dur 8.33333e+06 min_fps_dur 2e+08
### label:[ other ] source_id:[0]
### label:[ other ] source_id:[0]
### label:[ cry ] source_id:[0]
### label:[ cry ] source_id:[0]

In addition to label, we want to compute the confidence of the label from the probability. So that’s why I asked is it possible to output the probability of that label.

Thanks.

So the request is to output probability, right?

Yes, we need the proability, and if possible, could deepstream output the confidence of the predicted label?

There is “confidence” in NVIDIA DeepStream SDK API Reference: _NvDsAudioFrameMeta Struct Reference | NVIDIA Docs. This is the probability of the label.

Is there any example to output probability of the label? Thanks again.

Please refer to /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-audio/deepstream_audio_main.c, there is sample of getting the NvDsAudioFrameMeta from the GstBuffer.

Thanks, it works.

max_fps_dur 8.33333e+06 min_fps_dur 2e+08
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [FullDims Engine Info]: layers num: 2
0   INPUT  kFLOAT input           1x625x128       min: 1x1x625x128     opt: 2x1x625x128     Max: 2x1x625x128     
1   OUTPUT kFLOAT output          3               min: 0               opt: 0               Max: 0               


Runtime commands:
	h: Print this help
	q: Quit

	p: Pause
	r: Resume

** INFO: <bus_callback:137>: Pipeline ready

** INFO: <bus_callback:123>: Pipeline running

max_fps_dur 8.33333e+06 min_fps_dur 2e+08
### label:[ people_voice ] source_id:[0] probability:[34.958351]
### label:[ people_voice ] source_id:[0] probability:[33.954536]
### label:[ people_voice ] source_id:[0] probability:[41.512405]
### label:[ people_voice ] source_id:[0] probability:[43.433502]

Next request is to output the probability of each class.
Is there any example? Or could you tell me how to do it in deepstream_audio_main.c?

I’m appreciated your kindly help.

There are only two classes with your model(the output tensor dimension is 2), the probability for another class can be calculated.

For the models which output more classes, why do you need the probability for every class?

In real applications, we have 3-5 classes. For each input frame, we wish the output could be multi-label (e.g., 3 labels), also provide the confidence for every label, and decide the final label(s) according to a confidence threshold.

To do so, we need to the prob. for every class, and then compute the confidence for every class from the prob. of multi models.

If there are multiple classes in one audio frame, please get them one by one from classifier_meta_list in NvDsAudioFrameMeta

Thanks for the hint. It seems classifier_meta_list is a member of NvDsClassifierMetaList, and NvDsClassifierMetaList is a type of Glist, where can I get the info of Glist?

Please google for what is GList. All header files are available. The API documents have also been provided.

There are lots of samples of getting different meta data from different GList in our sample code. Please start with the deepstream-test1 sample code. C/C++ Sample Apps Source Details — DeepStream documentation 6.4 documentation

Thanks again. Following deepstream-test5 example, I can get the prob. of the label info. from NvDsClassifierMeta

However, in NvDsLabelInfo, result_label and result_prob only store the label of the classified object and the best prob. in one audio frame, do I mis-understand something?

Yes. You are right. For multiple class model, it output classes inferered from the frame. It depends on your model.

Thanks. In sample_apps/deepstream-audio, sonyc_audio_classify.onnx is a 31-class model. Is it possible to output the probability of each class?

The output node of this onnx file is of shape (1,31). Does this shape match the output of a multiple class model? If not, what is the expected output shape of a multiple class model?

In our case, the output of an onnx model is of shape (1,3), when using NvDsLabelInfo to get result_prob and result_label, for one audio frame, we only get the best prob. and the classified label. So the problem is our model is not a multiple class one, or we wrote wrong codes?

max_fps_dur 8.33333e+06 min_fps_dur 2e+08
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [FullDims Engine Info]: layers num: 2
0   INPUT  kFLOAT input           1x625x128       min: 1x1x625x128     opt: 2x1x625x128     Max: 2x1x625x128     
1   OUTPUT kFLOAT output          3               min: 0               opt: 0               Max: 0               

Runtime commands:
	h: Print this help
	q: Quit

	p: Pause
	r: Resume

** INFO: <bus_callback:137>: Pipeline ready

** INFO: <bus_callback:123>: Pipeline running

max_fps_dur 8.33333e+06 min_fps_dur 2e+08
### label:[people_voice], probability:[34.958351]
### label:[people_voice], probability:[33.954536]
### label:[people_voice], probability:[58.582298]
### label:[people_voice], probability:[64.005959]
### label:[people_voice], probability:[65.629326]
### label:[people_voice], probability:[73.541138]
...
** INFO: <bus_callback:160>: Received EOS. Exiting ...

Quitting
App run successful

Thanks again.

Yes. It is possible.

The output tensor is not “CHW” format. The default tensor parsing algorithm dose not support it.
Please customize your own postprocessing output tensor parsing algorithm for your model. “parse-classifier-func-name” parameter of nvinferaudio usage is the same as nvinfer.

Even gst-nvinferaudio is not open source, the postprocessing is open source. Please refer to /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_context_impl.cpp

This model’s output dimension is “3” but not “1x3”. We don’t know whether this model is multiple classes or not. Please consult the model owner for the meaning of the output tensor data and the postprocessing algorithm for such tensor output. The only thing we can tell you is that the default postprocessing tensor output parsing algorithm in gst-nvinferaudio is not for such model, you need to customize your own tensor output parsing algorithm according to your model.

1 Like

In sample_apps/deepstream-audio, sonyc_audio_classify.onnx case, if I change the shape of the output tensor to 1x1x31, does the default tensor parsing algorithm will support to output multiclass result?