DeepStream compatibility issues with UNET output layer change

Environment Information:

• Hardware (T4/V100/Xavier/Nano/etc): Tesla T4
• Network Type: Custom UNET Resnet18 - 16 classes
• TLT Version: Model Trained in TAO docker container, DeepStream running in container

I have trained a custom UNET model to perform semantic segmentation. Using the TAO pipeline I have exported the model to an INT8 etlt + calibration file.

Using the ds-tao-segmentation DeepStream app (found here: GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream) I am able to successfully build and run the segmentation app on the default UNET model provided. I get the following output on the test input image:

However, when I switch over to my custom model and relevant test image I get the following:

There’s no detections and the pixel mask is all the same class.

When inspecting the UNET output layer configuration, the DeepStream repo lists the output layer as softmax_1. However, the TAO documentation states that the softmax layer is replaced with argmax_1 during the model export optimization phase.

Is this mismatch the issue with the lack of detection from my custom model? If yes, so I retrain a previous version of UNET that does not replace softmax OR should I reconfigure the DeepStream app to accept argmax (and how does that process look??) ?

Custom model DeepStream config:
custom_config.txt (1.1 KB)

There’s no error log, just a lack of detections.

Refer to Custom TAO unet model classifying only two classes on Deepstream! - #25 by Morganh

So, please set below in config file.

Made the change but still got a blank pixel mask output

ds-tao-segmentation -c ds_huhf_pgie_config.txt -i 00051.jpg
Now playing: ds_huhf_pgie_config.txt
0:00:02.844431721    45 0x55681b78ba60 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImp            l::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/workspace/deepstream-huhf-unet/model_files/model_huhf_v0_600_cal_int8.etlt_b1_gp            u0_int8.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input_1:0       3x512x512
1   OUTPUT kINT32 argmax_1        512x512x1

0:00:02.844538460    45 0x55681b78ba60 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImp            l::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /workspace/deepstream-huhf-unet/model_files/model_huhf_v0_600_cal_int8.etlt_b1_gpu0_i            nt8.engine
0:00:02.858382421    45 0x55681b78ba60 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-nvinference-engine> [UID 1]: Load new model:ds_huhf_pgie_config.tx            t sucessfully
in videoconvert caps = video/x-raw(memory:NVMM), format=(string)RGBA, framerate=(fraction)1/1, width=(int)1280, height=(int)720
End of stream
Returned, stopping playback
Deleting pipeline

Can you run inference well via "tao unet inference xxx " ?

Yes, running inference and evaluations via TAO produces good evaluation metrics and predicted segmentation masks.

I’ve tested the engine in TAO as well and although there is some performance loss from the optimization it is still able to detect.

In deepstream, how about running with fp32 or fp16 mode? Is it ok?

I’ll test both fp32 and fp16 and report back.

Update: no luck with fp32 or fp16.

I was able to train the same custom UNET model using an older TAO/TLT version that did not replace the softmax output with argmax output. This model when run though the sample ds-tao-segmentation app was able to generate the different class pixel masks (note training was short to test the theory so output quality is not great):

What would the steps be to edit the current DeepStream segmentation app version to accept argmax? What files would I be looking to change in the app?

For 22.05 tao version, could you refer to UNET — TAO Toolkit 3.22.05 documentation ?

I would prefer to use the up-to-date TAO pipeline but it seems it is not compatible with the currently supported DeepStream apps. How can I modify the DeepStream app source code to accommodate the replaced argmax output layer?

Using the current version of TAO is what caused this issue in the first place

Not needed to modify any source code. Any log when you run the application? Please share the command, log and your latest config file.

ds-tao-segmentation -c pgie_unet_tao_config.txt -i 00051.jpg

Config File:
pgie_unet_tao_config.txt (845 Bytes)

There’s not much in terms of a log.

Is the mod team able to internally create a multi-class unet model and get proper annotation output from DeepStream using the latest TAO version?

As mentioned previously, could you modify below?

1 Like

I’ve already tried that fix and it did not have any impact when trying to run the newest version TAO model (see above). However, the older version TAO model does generate mask output using the config in my most recent reply.

The main difference between the two versions seems to be that in the older TAO model the softmax layer is not replaced with an argmax layer. All of the deepstream documentation and examples seem to look for softmax.

Outputs as listed by the Github ds-tao-segmentation repo:

9~10. UNET/PeopleSemSegNet softmax_1: A [batchSize, H, W, C] tensor containing the scores for each class

Is there a way to check whether this layer mismatch is the root cause of the lack of mask output in the newest TAO version models?

Hi Lucasp,
After revisiting this topic, for your model trained with 22.05 TAO, there is no issue when you

  1. Run “tao unet inference”
  2. Run unet tensorrt engine

But meet issue when deploy in deepstream.
According to tao user guide, could you add below in the config file?

## 0=Detector, 1=Classifier, 2=Semantic Segmentation (sigmoid activation), 3=Instance Segmentation, 100=skip nvinfer postprocessing

output-tensor-meta=1 # Set this to 1 when network-type is 100

BTW, for inspecting tensort engine, you can use polygraphy to check the output layers.
$ python -m pip install colored
$ python -m pip install polygraphy --index-url
$ polygraphy inspect model xxx.engine




To the configuration file didn’t seem to have any impact. I’m still not seeing proper mask predictions.

Polygraphy had the following output when I inspected the .engine model file:

[I] ==== TensorRT Engine ====
    Name: Unnamed Network 0 | Explicit Batch Engine

    ---- 1 Engine Input(s) ----
    {input_1:0 [dtype=float32, shape=(1, 3, 512, 512)]}

    ---- 1 Engine Output(s) ----
    {argmax_1 [dtype=int32, shape=(1, 512, 512, 1)]}

    ---- Memory ----
    Device Memory: 122331136 bytes

    ---- 1 Profile(s) (2 Binding(s) Each) ----
    - Profile: 0
        Binding Index: 0 (Input)  [Name: input_1:0] | Shapes: min=(1, 3, 512, 512), opt=(1, 3, 512, 512), max=(1, 3, 512, 512)
        Binding Index: 1 (Output) [Name: argmax_1]  | Shape: (1, 512, 512, 1)

    ---- 47 Layer(s) ----

Could you please provide the latest config file?

I’m facing the same situation.
If this problem has been resolved, I would like to give me the solution.

Here is a copy of the latest config file:
unet_config_updated.txt (2.3 KB)

We’re still trying to resolve this issue but I did have success by training a new UNET model in an older version of TAO/TLT that does not replace the softmax output layer with an argmax output layer. You can search through the previous UNET docs to see what TAO/TLT version works. That has been the only way our custom model has been able to generate masks in deepstream.