Interpret mask_params data from PeopleSegNetV2

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)


How should the mask output data from PeopleSegNet (v2.0.2) be interpreted?

Mask output layer is ’ mask_head/mask_fcn_logits/BiasAdd’ as described in NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream (

Each detected object has ‘mask_params’ metadata attached as below:

Could you please elaborate on the format of the output and how to convert it to a pixel mask to overlay on an image?

I’ve attached the float[3136] array from above as a .txt - How can I convert and scale the mask to fit the image?
The detection box is:
left: 251.45857
top: 270.86142
width: 171.64272
height: 243.53386
sample.txt (31.0 KB)

video resolution: 2688x1520
config: peoplesegnet_config.txt (891 Bytes)

Thanks in advance!


You can refer the link below first:

The NGC provides no real information on how to interpret the outputs except for “Category label (person), bounding-box coordinates and segmentation mask for each detected person in the input image.

Could you elaborate on how to convert the float[3126] vector from mask_params->data into a pixel mask I can overlay on the output video?


The data para is a pointer, not a float number. It points to a surfaceList of NvBufSurface. So you can draw the data of the NvBufsurface to the video.

Is it a Glib List of NvBufSurface?

The below breaks (Is C# code, but you get the point):

It is not a list of NvBufSurface pointers either (this breaks too):

If I read the memory (as float) I get the data attached in sample.txt in my first post.

If I take the first 28x28 elements with values above 0, I do in fact get a mask - however this seems like an incorrect way of doing it?

Is the memory layout wrong or am I interpreting it incorrectly?


Sorry, my fault. I misread the structure of your remarks. These datas you marked are float numbers output from the inference. You can convert it to a argb32 pixel format. This part of the code is not open source, so I suggest you refer to the API of the conversion:nvds_mask_utils_resize_to_binary_argb32 from nvds_mask_utils.h


I’ll use the nvds_mask_utils_resize_to_binary_argb32() of with the parameters as described in the header file.

Will the method expect the memory of src* and dst* to be already allocated on the Cuda device or does it allocate internally?

Much appreciated for the help!

EDIT: Nevermind it seems to allocate Cuda internally so need to manually allocate - it is working!

Can you explain the channel parameter?


It means the channel of your inferred data, basically you can set 1.

Much appreciated - everything is working,

Thanks for the help and have a great weekend!


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.