TAO trained 3-class Unet model only outputs 2 class output in Deepstream

Hey Folks,

I have got and TAO trained Unet model for 3-classes ,We have our python wrapper code sourced from Nvidia’s reference Python apps using our Trained unet Etlt model , where we need to use pyds returned cv2 numpy array as output indexed masks, But our returned array from the deepstream pipeline is only having 2 class output even with specified ‘num-class-detected=3’ argument, Rather then 3 classes to be used as output masks, Which we can use for our downstream pipeline code to work.

Our TAO based inference output for same model is an very good 3 class index mask but our deepstream only gives 2 class output output where we see Background marked as [-1] and rest of the Two classes are merged together as [0], so basically we are getting detection as : [Background],[obj1+obj2] , instead of 3 different classes.

I am even so confused over it since for same i have tried all different version of our model trained for different epoches, Even on different machines but it only gets us 2 class output.

We wish to have the desired 3 class mask output as we get from the TAO inferencing Results. for Further use. Any help and suggestion would be appreciated for the Same. We are badly stuck on this I have enclosed Python wrapper and our config_file as well.

• Python_Wrapper

import sys
sys.path.append('../')
import gi
import math

gi.require_version('Gst', '1.0')
from gi.repository import GLib, Gst
from common.is_aarch_64 import is_aarch64
from common.bus_call import bus_call

import cv2
import pyds
import numpy as np
import os.path
from os import path
import ctypes

ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.c_void_p
ctypes.pythonapi.PyCapsule_GetPointer.argtypes = [ctypes.py_object, ctypes.c_char_p]

MAX_DISPLAY_LEN = 64
MUXER_OUTPUT_WIDTH = 1920
MUXER_OUTPUT_HEIGHT = 1080
MUXER_BATCH_TIMEOUT_USEC = 4000000
TILED_OUTPUT_WIDTH = 512
TILED_OUTPUT_HEIGHT = 512
COLORS = [[128, 128, 64], [0, 0, 128], [0, 128, 128], [128, 0, 0],
          [128, 0, 128], [128, 128, 0], [0, 128, 0], [0, 0, 64],
          [0, 0, 192], [0, 128, 64], [0, 128, 192], [128, 0, 64],
          [128, 0, 192], [128, 128, 128]]

def map_mask_as_display_bgr(mask):
    """ Assigning multiple colors as image output using the information
        contained in mask. (BGR is opencv standard.)
    """
    # getting a list of available classes
    m_list = list(set(mask.flatten()))
    print('m_list',m_list)

    shp = mask.shape
    print(np.unique(mask))
    bgr = np.zeros((shp[0], shp[1], 3))#,dtype=np.int32)
    print(np.unique(bgr))
    for idx in m_list:
        print((idx),COLORS[idx])
        bgr[mask == idx] = COLORS[idx]
        #bgr[mask == idx] = idx
    print(np.unique(bgr))
    #print(bgr)
    return bgr


def seg_src_pad_buffer_probe(pad, info, u_data):
    gst_buffer = info.get_buffer()
    print(gst_buffer)
    if not gst_buffer:
        print("Unable to get GstBuffer ")
        return

    # Retrieve batch metadata from the gst_buffer
    # Note that pyds.gst_buffer_get_nvds_batch_meta() expects the
    # C address of gst_buffer as input, which is obtained with hash(gst_buffer)
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            # Note that l_frame.data needs a cast to pyds.NvDsFrameMeta
            # The casting is done by pyds.NvDsFrameMeta.cast()
            # The casting also keeps ownership of the underlying memory
            # in the C code, so the Python garbage collector will leave
            # it alone.
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
            print(frame_meta)
           

        except StopIteration:
            break
        frame_number = frame_meta.frame_num
        l_user = frame_meta.frame_user_meta_list
        while l_user is not None:
            try:
                # Note that l_user.data needs a cast to pyds.NvDsUserMeta
                # The casting is done by pyds.NvDsUserMeta.cast()
                # The casting also keeps ownership of the underlying memory
                # in the C code, so the Python garbage collector will leave
                # it alone.
                seg_user_meta = pyds.NvDsUserMeta.cast(l_user.data)
            except StopIteration:
                break
            ####TensorOutput
            meta_type = seg_user_meta.base_meta.meta_type
            if meta_type == pyds.NVDSINFER_TENSOR_OUTPUT_META:
                meta = pyds.NvDsInferTensorMeta.cast(seg_user_meta.user_meta_data)
                #classid=pyds.NvDsInferObjectDetectionInfo.classId()
                #print(classid(meta_type))


                frame_outputs = []
                for i in range(meta.num_output_layers):
                    layer = pyds.get_nvds_LayerInfo(meta, i)
                    # Convert NvDsInferLayerInfo buffer to numpy array
                    ptr = ctypes.cast(pyds.get_ptr(layer.buffer), ctypes.POINTER(ctypes.c_float))
                    print(ptr)
                    #v = np.ctypeslib.as_array(ptr, shape=output_shapes[i])
                    v = np.ctypeslib.as_array(ptr, shape=(4096,))
                    frame_outputs.append(v)
                print(np.unique(frame_outputs))


            ####DetectionInfo
            #meta_type = seg_user_meta.base_meta.meta_type
            #if meta_type == pyds.NVDSINFER_OBJECT_DETECTION_INFO:
            #meta = pyds.NvDsInferObjectDetectinInfo.cast(seg_user_meta.user_meta_data)
            #print(meta.classId)
                #frame_outputs = []
                #for i in range(meta.num_output_layers):
                #    layer = pyds.get_nvds_LayerInfo(meta, i)
                #    # Convert NvDsInferLayerInfo buffer to numpy array
                #    ptr = ctypes.cast(pyds.get_ptr(layer.buffer), ctypes.POINTER(ctypes.c_float))
                #    print(ptr)
                #    #v = np.ctypeslib.as_array(ptr, shape=output_shapes[i])
                #    v = np.ctypeslib.as_array(ptr, shape=(4096,))
                #    frame_outputs.append(v)
                #print(np.unique(frame_outputs))


            
            ####SegmentatioMeta
            if seg_user_meta and seg_user_meta.base_meta.meta_type == \
                    pyds.NVDSINFER_SEGMENTATION_META:
                try:
                    # Note that seg_user_meta.user_meta_data needs a cast to
                    # pyds.NvDsInferSegmentationMeta
                    # The casting is done by pyds.NvDsInferSegmentationMeta.cast()
                    # The casting also keeps ownership of the underlying memory
                    # in the C code, so the Python garbage collector will leave
                    # it alone.
                    segmeta = pyds.NvDsInferSegmentationMeta.cast(seg_user_meta.user_meta_data)
                    print(seg_user_meta.user_meta_data)
                    print('class',segmeta.classes)
                except StopIteration:
                    break
                # Retrieve mask data in the numpy format from segmeta
                # Note that pyds.get_segmentation_masks() expects object of
                # type NvDsInferSegmentationMeta
                ''' 
                meta = pyds.NvDsInferTensorMeta.cast(seg_user_meta.user_meta_data)
                frame_outputs = []
                for i in range(meta.num_output_layers):
                    print(i)
                    layer = pyds.get_nvds_LayerInfo(meta, i)
                    # Convert NvDsInferLayerInfo buffer to numpy array
                    ptr = ctypes.cast(pyds.get_ptr(layer.buffer), ctypes.POINTER(ctypes.c_float))
                    v = np.ctypeslib.as_array(ptr, shape=output_shapes[i])
                    frame_outputs.append(v)
                    print(v)
                '''

                print('classout',segmeta.classes)
                masks = pyds.get_segmentation_masks(segmeta)
                print('before',np.unique(np.array(masks)))
                print('mask_shape',masks.shape)
                np.save('masks.npy',masks) 
                masks = np.array(masks, copy=True, order='C')
                print('after',np.unique(masks))
                print(masks.shape)
                print(masks)
                print('class',segmeta.classes)                
                # map the obtained masks to colors of 2 classes.
                frame_image = map_mask_as_display_bgr(masks)
                print(np.unique(frame_image.astype(np.uint8),frame_image.shape))
                cv2.imwrite(folder_name + "/" + str(frame_number) + ".jpg", frame_image.astype(np.uint8))
                #cv2.imwrite(folder_name + "/" + str(frame_number) + ".jpg", masks)
            try:
                l_user = l_user.next
            except StopIteration:
                break
        try:
            l_frame = l_frame.next
        except StopIteration:
            break
    return Gst.PadProbeReturn.OK


def main(args):
    # Check input arguments
    if len(args) != 4:
        sys.stderr.write("usage: %s config_file <jpeg/mjpeg file> "
                         "<path to save seg images>\n" % args[0])
        sys.exit(1)

    global folder_name
    folder_name = args[-1]
    if path.exists(folder_name):
        sys.stderr.write("The output folder %s already exists. "
                         "Please remove it first.\n" % folder_name)
        sys.exit(1)
    os.mkdir(folder_name)

    config_file = args[1]
    num_sources = len(args) - 3
    # Standard GStreamer initialization
    Gst.init(None)

    # Create gstreamer elements
    # Create Pipeline element that will form a connection of other elements
    print("Creating Pipeline \n ")
    pipeline = Gst.Pipeline()

    if not pipeline:
        sys.stderr.write(" Unable to create Pipeline \n")

    # Source element for reading from the file
    print("Creating Source \n ")
    source = Gst.ElementFactory.make("filesrc", "file-source")
    if not source:
        sys.stderr.write(" Unable to create Source \n")

    # Since the data format in the input file is jpeg,
    # we need a jpegparser
    print("Creating jpegParser \n")
    jpegparser = Gst.ElementFactory.make("jpegparse", "jpeg-parser")
    if not jpegparser:
        sys.stderr.write("Unable to create jpegparser \n")

    # Use nvdec for hardware accelerated decode on GPU
    print("Creating Decoder \n")
    decoder = Gst.ElementFactory.make("nvv4l2decoder", "nvv4l2-decoder")
    if not decoder:
        sys.stderr.write(" Unable to create Nvv4l2 Decoder \n")

    # Create nvstreammux instance to form batches from one or more sources.
    streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
    if not streammux:
        sys.stderr.write(" Unable to create NvStreamMux \n")

    # Create segmentation for primary inference
    seg = Gst.ElementFactory.make("nvinferbin", "primary-nvinference-engine")
    if not seg:
        sys.stderr.write("Unable to create primary inferene\n")

    # Create nvsegvisual for visualizing segmentation
    nvsegvisual = Gst.ElementFactory.make("nvsegvisual", "nvsegvisual")
    if not nvsegvisual:
        sys.stderr.write("Unable to create nvsegvisual\n")

    if is_aarch64():
        transform = Gst.ElementFactory.make("nvegltransform", "nvegl-transform")

    print("Creating EGLSink \n")
    #sink = Gst.ElementFactory.make("nveglglessink", "nvvideo-renderer")
    sink = Gst.ElementFactory.make("filesink", "nvvideo-renderer")
    if not sink:
        sys.stderr.write(" Unable to create egl sink \n")

    print("Playing file %s " % args[2])
    source.set_property('location', args[2])
    if is_aarch64() and (args[2].endswith("mjpeg") or args[2].endswith("mjpg")):
        decoder.set_property('mjpeg', 1)
    streammux.set_property('width', 1920)
    streammux.set_property('height', 1080)
    streammux.set_property('batch-size', 1)
    streammux.set_property('batched-push-timeout', 4000000)
    seg.set_property('config-file-path', config_file)
    pgie_batch_size = seg.get_property("batch-size")
    if pgie_batch_size != num_sources:
        print("WARNING: Overriding infer-config batch-size", pgie_batch_size,
              " with number of sources ", num_sources,
              " \n")
        seg.set_property("batch-size", num_sources)
    nvsegvisual.set_property('batch-size', num_sources)
    nvsegvisual.set_property('width', 512)
    nvsegvisual.set_property('height', 512)
    #sink.set_property("qos", 0)
    sink.set_property("location", 'sample_out.mkv')
    print("Adding elements to Pipeline \n")
    pipeline.add(source)
    pipeline.add(jpegparser)
    pipeline.add(decoder)
    pipeline.add(streammux)
    pipeline.add(seg)
    pipeline.add(nvsegvisual)
    pipeline.add(sink)
    
    if is_aarch64():
        pipeline.add(transform)

    # we link the elements together
    # file-source -> jpeg-parser -> nvv4l2-decoder ->
    # nvinfer -> nvsegvisual -> sink
    print("Linking elements in the Pipeline \n")
    source.link(jpegparser)
    jpegparser.link(decoder)

    sinkpad = streammux.get_request_pad("sink_0")
    if not sinkpad:
        sys.stderr.write(" Unable to get the sink pad of streammux \n")
    srcpad = decoder.get_static_pad("src")
    if not srcpad:
        sys.stderr.write(" Unable to get source pad of decoder \n")
    srcpad.link(sinkpad)
    streammux.link(seg)
    seg.link(nvsegvisual)
    if is_aarch64():
        nvsegvisual.link(transform)
        transform.link(sink)
    else:
        nvsegvisual.link(sink)
    # create an event loop and feed gstreamer bus mesages to it
    loop = GLib.MainLoop()
    bus = pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect("message", bus_call, loop)

    # Lets add probe to get informed of the meta data generated, we add probe to
    # the src pad of the inference element
    seg_src_pad = seg.get_static_pad("src")
    if not seg_src_pad:
        sys.stderr.write(" Unable to get src pad \n")
    else:
        seg_src_pad.add_probe(Gst.PadProbeType.BUFFER, seg_src_pad_buffer_probe, 0)

    # List the sources
    print("Now playing...")
    for i, source in enumerate(args[1:-1]):
        if i != 0:
            print(i, ": ", source)

    print("Starting pipeline \n")
    # start play back and listed to events
    pipeline.set_state(Gst.State.PLAYING)
    try:
        loop.run()
    except:
        pass
    # cleanup
    pipeline.set_state(Gst.State.NULL)


if __name__ == '__main__':
    sys.exit(main(sys.argv))

• Config_file

[property]
gpu-id=0
net-scale-factor=0.007843
# Since the model input channel is 3, using RGB color format.
model-color-format=1
offsets=127.5;127.5;127.5

labelfile-path=labels.txt

model-engine-file=Model.etlt_b1_gpu0_fp32.engine
infer-dims=3;512;512
batch-size=1

## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
num-detected-classes=3
interval=0
gie-unique-id=1
network-type=2
output-blob-names=argmax_1
#segmentation-threshold=0.0
maintain-aspect-ratio=0
segmentation-output-order=1
secondary-reinfer-interval=15

[class-attrs-all]
threshold=0.0
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

• Code output

class_Detected : 1
mask_shape : (512, 512)
Mask_instances : [0,-1]
Output_array :
[[-1 -1 -1 ... -1 -1 -1]
 [-1 -1 -1 ... -1 -1 -1]
 [-1 -1 -1 ... -1 -1 -1]
 ...
 [-1 -1 -1 ... -1 -1 -1]
 [-1 -1 -1 ... -1 -1 -1]
 [-1 -1 -1 ... -1 -1 -1]]
class_Detected : 1

Instance_and_assigned_color :  0 [128, 128, 64]
Instance_and_assigned_color : -1 [128, 128, 128]
[ 64. 128.]
(array([ 64, 128], dtype=uint8), array([269486,      0]))

• Hardware Platform (Jetson / GPU)

Tesla T4

• DeepStream Version

deepstream-app version 6.0.1
DeepStreamSDK 6.0.1
CUDA Driver Version: 11.4
CUDA Runtime Version: 11.4
TensorRT Version: 8.4
cuDNN Version: 8.4
libNVWarp360 Version: 2.0.1d3
gst-launch-1.0 version 1.20.3
GStreamer 1.20.3

Hi @nitinp14920914 ,
If you use GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream UNET sample to run your UNET, can it get correct output?

Hi ,
Actually I have already come across the same documentation but we need to have our system to work with python but the mentioned one is an C resource also our issue is not with Deploying and using TAO Model with Deepstream Reference app, Our main issue is the Instead of 3 class detection its Giving us only 2 class output there the Class 1 which is background is detected well but somehow class 1 and class 2 gets merged in one class detection

sorry for delay! Has you confirmed that the model you trained can output 3 classes?

Yes we did have confirmed the same, we used same model and check the inferencing output from TAO which get 3 class output.

After changing the Model i am getting 3 class output but they are not right my one class seems to superimpose over second class only with Deep stream Inferencing also detection is very Noisy

My config file now looks like :

[property]

labelfile-path=../Model/labels.txt
engine-create-func-name=NvDsInferYoloCudaEngineGet

tlt-encoded-model=../Model/Model.etlt

net-scale-factor=0.00784313725490196
offsets=127.5;127.5;127.5
infer-dims=3;512;512
tlt-model-key=Key
network-type=2
num-detected-classes=3
model-color-format=1
segmentation-threshold=0.0
output-blob-names=argmax_1
segmentation-output-order=1
gie-unique-id=1

[class-attrs-all]
threshold=0.3