Bounding box coordinates in Python - rect_params are incorrect

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
Jetson Xavier NX
• DeepStream Version
6.0.1
• JetPack Version (valid for Jetson only)
4.6

I have an application (in Python) that needs to make a cutout of the frame that’s taken from the gstreamer/deepstream pipeline through a buffer probe. Somehow though, the coordinates of the detected objects are incorrect.

Example:

This is de metadata I get:

{'type': 'object', 'class_id': 0, 'obj_label': 'Car', 'confidence': 0.8906721472740173, 'coordinates': (1088, 453, 1259, 576), 'x_left': 1088, 'x_right': 1259, 'y_top': 453, 'y_bottom': 576}
{'type': 'object', 'class_id': 0, 'obj_label': 'Car', 'confidence': 0.8321117758750916, 'coordinates': (715, 452, 985, 557), 'x_left': 715, 'x_right': 985, 'y_top': 452, 'y_bottom': 557}
{'type': 'object', 'class_id': 0, 'obj_label': 'Car', 'confidence': 0.79091477394104, 'coordinates': (1693, 466, 1851, 578), 'x_left': 1693, 'x_right': 1851, 'y_top': 466, 'y_bottom': 578}
{'type': 'object', 'class_id': 0, 'obj_label': 'Car', 'confidence': 0.24974358081817627, 'coordinates': (1864, 545, 1917, 595), 'x_left': 1864, 'x_right': 1917, 'y_top': 545, 'y_bottom': 595}

The OSD seems to be very accurate but the coordinates are not. I get this metadata by doing (redacted some info):

frame_meta = pyds.NvDsFrameMeta.cast(frame_meta_list.data)
obj_meta_list = frame_meta.obj_meta_list
obj_meta=pyds.NvDsObjectMeta.cast(obj_meta_list.data)

x_left = int(obj_meta.rect_params.left)
x_right = int(obj_meta.rect_params.left + obj_meta.rect_params.width)
y_top = int(obj_meta.rect_params.top)
y_bottom =  int(obj_meta.rect_params.top + obj_meta.rect_params.height)
coordinates = (x_left, y_top, x_right, y_bottom)

obj_meta_dict = {
    "type": "object",
    "class_id": obj_meta.class_id,
    "obj_label": obj_meta.obj_label,
    "confidence": obj_meta.confidence,
    "coordinates": coordinates,
    "x_left": x_left,
    "x_right": x_right,
    "y_top": y_top,
    "y_bottom": y_bottom
}

I have to use the rect_params because as far as I know those are the only coordinates I can get from the Python bindings? I found the

NvDsComp_BboxInfo _NvDsObjectMeta::detector_bbox_info

in the Deepstream API guide but this struct/object seems to be unreachable from Python. I.e. it’s not included in the bindings? But some links that should guide me to more information actually land on a generic page:

I believe the rect_params are somehow altered by some plugin in the pipeline. How do I get to the original bounding box coordinates of the detected objects?

PGIE (nvinfer) config:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-file=../deepstream_models/Primary_Detector/resnet10.caffemodel
proto-file=../deepstream_models/Primary_Detector/resnet10.prototxt
model-engine-file=../deepstream_models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
labelfile-path=../deepstream_models/Primary_Detector/labels.txt
int8-calib-file=../deepstream_models/Primary_Detector/cal_trt.bin
force-implicit-batch-dim=1
cluster-mode=3
batch-size=1
process-mode=1
model-color-format=0
network-mode=1
num-detected-classes=4
interval=0
gie-unique-id=1
output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid

[class-attrs-all]
topk=5

Some help, documentation, examples, etc. would be greatly appreciated.

Suppose nvdsosd use the same rect_params to draw BB. Why you say OSD is right, but the value you get is wrong? Do you get the value in the nvdsosd sink pad?

That’s exactly what I thought… It does not matter on which plugin I attach a buffer probe. Whether before or after de NVOSD… I get way too small bounding boxes and mostly wrongly placed. Something is messing up the parameters but I’m not sure what… ?

src_pad = nvosd.get_static_pad("src")
    if not src_pad:
        logging.error(" Unable to get src pad for bufferprobe.")
        return -1
    else:
        src_pad.add_probe(Gst.PadProbeType.BUFFER, buffer_probe, 0)

In my original post the metadata is from the exact same frame as the still/screenshot with the OSD. You can clearly see that the coordinates aren’t right.

My pipeline is currently:

source -> streammux -> nvof -> nvinfer -> nvvidconv -> nvosd -> fakesink

I’ve tried switching off the nvof, or shuffling them around, but that doesn’t improve anything. Could there be some sort of parameter somewhere that scales the rect inputs/outputs?

Now I come to think about it… I’m so confident that the frame and metadata are the same, but are they? …

What I do is combine the metadata (that I need) and the frame into an object and I put this object into a multiprocessing queue for further processing. I assumed that the metadata and the frame in the same buffer probe call belong together. But do they really? Could the streammuxer be messing things up?

I’ve tried getting rid of the streammuxer but I’m unsure how. When I just remove it and link the source and the nvof or nvinfer plugins together, things don’t work anymore. I think I don’t need the streammuxer because I’m only ever going to use one source (mostly RTSP but sometimes files during testing). Also for proprietary reasons the resizing function of the streammuxer is really annoying. Some of our input streams are different resolution and need to be that way for a reason. The pipeline shouldnt resize them. But the streammuxer needs a fixed width and height set, which annoys me. Anyhow, still trying to figure this out. Can’t sleep at night from it ;)

For completenes sake. This is my buffer probe function. For IP reasons I’ve redacted some unimportant stuff.

def buffer_probe(pad, info, u_data):
    frame_number = 0
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        logging.error("Unable to get GstBuffer.")
        return

    # Retrieve batch metadata from the gst_buffer
    # Note that pyds.gst_buffer_get_nvds_batch_meta() expects the
    # C address of gst_buffer as input, which is obtained with hash(gst_buffer)
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
           
    current_frame = pyds.get_nvds_buf_surface(hash(gst_buffer), 0)

    frame_meta_list = batch_meta.frame_meta_list

    # Note that frame_meta_list.data needs a cast to pyds.NvDsFrameMeta
    # The casting is done by pyds.NvDsFrameMeta.cast()
    # The casting also keeps ownership of the underlying memory
    # in the C code, so the Python garbage collector will leave
    # it alone.
    frame_meta = pyds.NvDsFrameMeta.cast(frame_meta_list.data)
    
    frame_number = frame_meta.frame_num
    obj_meta_list = frame_meta.obj_meta_list
    user_meta_list = frame_meta.frame_user_meta_list

    casted_meta_list = []
    while obj_meta_list is not None:
        try: 
            # Casting obj_meta_list.data to pyds.NvDsObjectMeta
            obj_meta=pyds.NvDsObjectMeta.cast(obj_meta_list.data)
        except StopIteration:
            break
        

        # We have to get rid of all the pyds.NvDs* objects in the metadata because we can't pickle those, so we just get what we need for
        # the detectors to do their job down the line...
        x_left = int(obj_meta.rect_params.left)
        x_right = int(obj_meta.rect_params.left + obj_meta.rect_params.width)
        y_top = int(obj_meta.rect_params.top)
        y_bottom =  int(obj_meta.rect_params.top + obj_meta.rect_params.height)
        coordinates = (x_left, y_top, x_right, y_bottom)

        obj_meta_dict = {
            "type": "object",
            "class_id": obj_meta.class_id,
            "obj_label": obj_meta.obj_label,
            "confidence": obj_meta.confidence,
            "coordinates": coordinates,
            "x_left": x_left,
            "x_right": x_right,
            "y_top": y_top,
            "y_bottom": y_bottom
        }
        casted_meta_list.append(obj_meta_dict)

        try: 
            obj_meta_list = obj_meta_list.next
        except StopIteration:
            break
    
    while user_meta_list is not None:
        try:
            of_user_meta = pyds.NvDsUserMeta.cast(user_meta_list.data)
        except StopIteration:
            break
        try:
            # Casting of_user_meta.user_meta_data to pyds.NvDsOpticalFlowMeta
            of_meta = pyds.NvDsOpticalFlowMeta.cast(of_user_meta.user_meta_data)
            # Get Flow vectors
            flow_vectors = pyds.get_optical_flow_vectors(of_meta)
            # Reshape the obtained flow vectors into proper shape
            flow_vectors = flow_vectors.reshape(of_meta.rows, of_meta.cols, 2)

            user_meta_dict = {
                "type": "opticalflow",
                "flow_vectors": flow_vectors
            }
            casted_meta_list.append(user_meta_dict)

        except StopIteration:
            break
        try:
            user_meta_list = user_meta_list.next
        except StopIteration:
            break

    try:
        detector_input_queue.put(DetectorInputCapsule(current_frame, frame_number, casted_meta_list), block=False)
    except Full:
        try:
            while True: # a Multiprocessing Queue does not have a clear() method or something similar, so we have to clear it the dirty way
                detector_input_queue.get_nowait()
        except Empty:
            pass    

    return Gst.PadProbeReturn.OK

I found a topic where someone has a similar (or the exact same?) problem. But he is/was using multiple sources which somehow messed up the rect_params for him. I’m not using multiple sources but I am using the streammuxer. I’m still guessing it has something to do with it.

This is my pipeline creation, by the way:

    logging.debug("Creating Gstreamer/Deepstream Pipeline.")
    if not pipeline:
        logging.error(" Unable to create Pipeline.")
        return -1

    # Create nvstreammux instance to form batches from one or more sources.
    streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
    if not streammux:
        logging.error(" Unable to create NvStreamMux.")
        return -1

    pipeline.add(streammux)
   
    if config.input_stream.find("rtsp://") == 0 :
        is_live = True
    source_bin = deepstream_common.utils.create_source_bin(0, config.input_stream)
    if not source_bin:
        logging.error(" Unable to create source bin.")
        return -1
    pipeline.add(source_bin)
    padname = "sink_0"
    sinkpad = streammux.get_request_pad(padname) 
    if not sinkpad:
        logging.error(" Unable to create sink pad bin.")
        return -1
    srcpad = source_bin.get_static_pad("src")
    if not srcpad:
        logging.error(" Unable to create src pad bin.")
        return -1
    srcpad.link(sinkpad)

    # Pipelines
    # With video output: source -> streammux ->  nvof -> nvinfer -> nvvidconv -> nvosd -> (nvegltransform) -> nveglglessink
    # Without video output: source -> streammux -> nvof -> nvinfer -> nvvidconv -> nvosd -> fakesink

    if config.tracking_list or config.motion_alert_list:
        nvof = Gst.ElementFactory.make("nvof", "optical-flow")
        if not nvof:
            logging.error(" Unable to create nvof.")
            return -1
        pipeline.add(nvof)

    pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
    if not pgie:
        logging.error(" Unable to create pgie.")
        return -1
    pipeline.add(pgie)

    nvvidconv = Gst.ElementFactory.make("nvvideoconvert", "convertor")
    if not nvvidconv:
        logging.error(" Unable to create nvvidconv.")
        return -1
    pipeline.add(nvvidconv)

    nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
    if not nvosd:
        logging.error(" Unable to create nvosd.")
        return -1
    
    # https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvdsosd.html
    if is_aarch64():
        nvosd.set_property('process-mode', 2)
    else:
        nvosd.set_property('process-mode', 1)

    # Switch off text and bounding boxes when video output is turned off
    # Totally removing the OSD somehow switches off RGBA conversion by the nvvidconv, in turn ruining our pipeline/buffer probe
    if config.video_output:
        nvosd.set_property('display-text', 1)
        nvosd.set_property('display-bbox', 1)
    else:
        nvosd.set_property('display-text', 1)
        nvosd.set_property('display-bbox', 1)
    pipeline.add(nvosd)

    if config.video_output:
        # On a Jetson platform Gst-nveglglessink works on EGLImage structures.
        # Gst-nvegltranform is required to convert incoming data (wrapped in an NVMM structure) to an EGLImage instance.
        # On a dGPU platform, Gst-nveglglessink works directly on data wrapped in an NVMM structure.
        if is_aarch64():
            transform=Gst.ElementFactory.make("nvegltransform", "nvegl-transform")
            if not transform:
                logging.error(" Unable to create transform.")
                return -1
            pipeline.add(transform)
        sink = Gst.ElementFactory.make("nveglglessink", "nvvideo-renderer")
        if not sink:
            logging.error(" Unable to create egl sink.")
            return -1
        sink.set_property("qos",0)
    else:
        sink = Gst.ElementFactory.make('fakesink', 'fake-sink')
        sink.set_property('sync', 1)
    pipeline.add(sink)

    if is_live:
        streammux.set_property('live-source', 1)

    pgie.set_property('config-file-path', "deepstream_config/ds_pgie_config.txt")
    
    streammux.set_property('width', 1920)
    streammux.set_property('height', 1080)
    streammux.set_property('batch-size', 1)
    streammux.set_property('batched-push-timeout', 1000000)

    if config.tracking_list or config.motion_alert_list:
        streammux.link(nvof)
        nvof.link(pgie)
    else:
        streammux.link(pgie)

    pgie.link(nvvidconv)
    nvvidconv.link(nvosd)
    
    if config.video_output and is_aarch64():
            nvosd.link(transform)
            transform.link(sink)
    else:
        nvosd.link(sink)

    # create an event loop and feed gstreamer bus mesages to it
    gstreamer_loop = GObject.MainLoop()
    bus = pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect ("message", bus_call, gstreamer_loop)

    src_pad = nvosd.get_static_pad("src")
    if not src_pad:
        logging.error(" Unable to get src pad for bufferprobe.")
        return -1
    else:
        src_pad.add_probe(Gst.PadProbeType.BUFFER, buffer_probe, 0)

See anything out of the ordinary? I have some switches in place to enable or disable video output (and OSD), and to enable or disable optical flow analysis, and for x86 and Jetson compatibility (or so is the idea). For the rest I believe it’s fairly standard code taken from the examples…

I finally found the cause of the problem! Man, this took me 3 days to figure out and I could’ve known from the start…

The situation: as you can see from my code I encapsulate the framebuffer (nparray), frame number and corresponding metadata into a multiprocessing.Queue. Another process picks these capsule up for further processing.

I had been saving the stills and metadata at “the other side” of the queue.

Sometimes, especially in the first few seconds, the frames accumulate in the Queue as the gstreamer pipeline is filling it up and the other process is having a hard time keeping up. This should not be a problem, as we are not doing a real-time analysis so a small lag is OK.

But: this actually IS a problem when you accidentally pass in the framebuffer BY REFERENCE… I had never noticed this. But when I started saving the stills at the buffer_probe function I noticed that they weren’t the same as the original ones I saved.

Solution:

copy.deepcopy(frame)

I feel so stupid that it took me so long to figure this out. I thought it was the deepstream pipeline ruining the metadata somehow. But it was simply the still/framebuffer that was way ahead in time because of the reference.

Disclaimer: the deepcopy is quite an expensive operation. The queue.put() method takes about 100 microseconds in the old situation and now we’re talking about 1000 microseconds with the deepcopy operation. Something to take into account as I know the buffer_probe is actually a blocking function so it might mess up the gstreamer pipeline if it takes any longer than this. I hope, I think, 1 millisecond is still manageable though.

Great! Glad to know you fixed the issue.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.