(python) memory error on the same frame each run on certain confidence value

mai.algendy · February 24, 2021, 2:57pm

I have a (python) pipeline that has one detector followed by some logic that gets applied each frame
this pipeline runs fine if the confidence for the detector is set around 0.6 but fails at the same frame each run if the confidence is set to a lower value and gives this error

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
the only lead I have is the values of the confidence that show that it might be related to some low conf faulty detection

but since it’s hard to debug deepstream I can’t really pinpoint the issue
it does this with two different test videos, each of them has a different frame that causes the crash
but it’s the same frame each run
(ex vid 1 always crashes at frame number x each run, while any run that uses vid 2 crashes at frame number y)
The detector is YoloV4 trained with COCO dataset, and this is its config file

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
#0=RGB, 1=BGR
model-color-format=0
model-engine-file=../models/yolov4_original_30_streams_dynamic/yolov4-dynamic_30.engine
labelfile-path=../models/yolov4_original_30_streams_dynamic/labels.txt
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=80
gie-unique-id=1
network-type=0
is-classifier=0
filter-out-class-ids=2;3;5;7 ##filter out person and vehicle classes
## 0=Group Rectangles, 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
cluster-mode=2
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV4
custom-lib-path=libnvdsinfer_custom_impl_YoloV4_8classees_PersonCarMotorbikeBusTruckBackpackHandbagSuitcase.so
engine-create-func-name=NvDsInferYoloCudaEngineGet


[class-attrs-all]
nms-iou-threshold=0.6
pre-cluster-threshold=0.3

and here’s the pipeline creation in python code

def main(args):
    """
    makes elements, adds them to pipeline, links them together, runs the main loop
    :param args: python command arguments, including file name
    """
    # Make sure no arguments are sent through terminal
    if len(args) != 1:
        sys.stderr.write(f"please input all your arguments in {prim_config_path}\n")
        sys.exit(1)

    is_live = bool(int(configFile['streammux']['live-source']))
    number_sources = int(configFile['streammux']['batch-size'])
    sources_in_configfile = len([s for s in configFile.sections() if "source" in s])

    # Standard GStreamer initialization
    GObject.threads_init()
    Gst.init(None)

    # Create gstreamer elements
    # Create Pipeline element that will form a connection of other elements
    print("Creating Pipeline \n ")
    pipeline = Gst.Pipeline()

    if not pipeline:
        sys.stderr.write(" Unable to create Pipeline \n")
    print("Creating streamux \n ")

    # Create nvstreammux instance to form batches from one or more sources.
    streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
    if not streammux:
        sys.stderr.write(" Unable to create NvStreamMux \n")

    pipeline.add(streammux)
    for i in range(number_sources):
        try:
            frame_count[f"stream_{i}"] = 0
            saved_count[f"stream_{i}"] = 0
            fps_streams[f"stream_{i}"] = GETFPS(i)
            print("Creating source_bin ", i, " \n ")
            uri_name = configFile[f'source{i}']['uri']
            if uri_name.find("http://") == 0:
                is_live = True
            source_bin = create_source_bin(i, uri_name)
            if not source_bin:
                sys.stderr.write("Unable to create source bin \n")
            pipeline.add(source_bin)
            pad_name = f"sink_{i}"
            sink_pad = streammux.get_request_pad(pad_name)
            if not sink_pad:
                sys.stderr.write("Unable to create sink pad bin \n")
            src_pad = source_bin.get_static_pad("src")
            if not src_pad:
                sys.stderr.write("Unable to create src pad bin \n")
            src_pad.link(sink_pad)
            fps = int(configFile[f"source{i}"]["fps"])


        except:
            print("\n\n Number of input sources doesn't match the batch-size \
            specified in the config file!. \n System will start up with available streams\n\n" )
            break

    print("Creating Pgie \n ")
    pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
    if not pgie:
        sys.stderr.write(" Unable to create pgie \n")
    # Add nvvidconv1 and filter1 to convert the frames to RGBA
    # which is easier to work with in Python.
    print("Creating nvvidconv1 \n ")
    nvvidconv1 = Gst.ElementFactory.make("nvvideoconvert", "convertor1")
    if not nvvidconv1:
        sys.stderr.write(" Unable to create nvvidconv1 \n")
    print("Creating filter1 \n ")
    caps1 = Gst.Caps.from_string("video/x-raw(memory:NVMM), format=RGBA")
    filter1 = Gst.ElementFactory.make("capsfilter", "filter1")
    if not filter1:
        sys.stderr.write(" Unable to get the caps filter1 \n")
    filter1.set_property("caps", caps1)
    print("Creating tiler \n ")
    tiler = Gst.ElementFactory.make("nvmultistreamtiler", "nvtiler")
    if not tiler:
        sys.stderr.write(" Unable to create tiler \n")
    print("Creating nvvidconv \n ")
    nvvidconv = Gst.ElementFactory.make("nvvideoconvert", "convertor")
    if not nvvidconv:
        sys.stderr.write(" Unable to create nvvidconv \n")
    print("Creating nvosd \n ")
    nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
    if not nvosd:
        sys.stderr.write(" Unable to create nvosd \n")
    if is_aarch64():
        print("Creating transform \n ")
        transform = Gst.ElementFactory.make("nvegltransform", "nvegl-transform")
        if not transform:
            sys.stderr.write(" Unable to create transform \n")

    print("Creating EGLSink \n")
    display = int(configFile['streammux']['allow-display'])
    if display:
        sink = Gst.ElementFactory.make("nveglglessink", "nvvideo-renderer")
    else:
        sink = Gst.ElementFactory.make("fakesink", "fakesink")
    if not sink:
        sys.stderr.write(" Unable to create egl sink \n")

    # if is_live:
    #     print("Atleast one of the sources is live")
    #     streammux.set_property('live-source', 1)

    streammux.set_property('width', 1920)
    streammux.set_property('height', 1080)
    streammux.set_property('batch-size', number_sources)
    streammux.set_property('batched-push-timeout', 4000000)
    pgie.set_property('config-file-path', prim_config_path)
    pgie_batch_size = pgie.get_property("batch-size")
    if pgie_batch_size != number_sources:
        print("WARNING: Overriding infer-config batch-size", pgie_batch_size, " with number of sources ",
              number_sources, " \n")
        pgie.set_property("batch-size", number_sources)
    tiler_rows = int(math.sqrt(number_sources))
    tiler_columns = int(math.ceil((1.0 * number_sources) / tiler_rows))
    tiler.set_property("rows", tiler_rows)
    tiler.set_property("columns", tiler_columns)
    tiler.set_property("width", TILED_OUTPUT_WIDTH)
    tiler.set_property("height", TILED_OUTPUT_HEIGHT)
    if is_live:
        sink.set_property("sync", 1)
    else:
        sink.set_property("sync", 0)

    if not is_aarch64():
        # Use CUDA unified memory in the pipeline so frames
        # can be easily accessed on CPU in Python.
        mem_type = int(pyds.NVBUF_MEM_CUDA_UNIFIED)
        streammux.set_property("nvbuf-memory-type", mem_type)
        nvvidconv.set_property("nvbuf-memory-type", mem_type)
        nvvidconv1.set_property("nvbuf-memory-type", mem_type)
        tiler.set_property("nvbuf-memory-type", mem_type)

    print("Adding elements to Pipeline \n")
    pipeline.add(pgie)
    pipeline.add(tiler)
    pipeline.add(nvvidconv)
    pipeline.add(filter1)
    pipeline.add(nvvidconv1)
    pipeline.add(nvosd)
    if is_aarch64():
        pipeline.add(transform)
    pipeline.add(sink)

    print("Linking elements in the Pipeline \n")
    streammux.link(pgie)
    pgie.link(nvvidconv1)
    nvvidconv1.link(filter1)
    filter1.link(tiler)
    tiler.link(nvvidconv)
    nvvidconv.link(nvosd)
    if is_aarch64():
        nvosd.link(transform)
        transform.link(sink)
    else:
        nvosd.link(sink)

    # create an event loop and feed gstreamer bus messages to it
    loop = GObject.MainLoop()
    bus = pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect("message", bus_call, loop)

    tiler_sink_pad = tiler.get_static_pad("sink")
    if not tiler_sink_pad:
        sys.stderr.write(" Unable to get src pad \n")
    else:
        tiler_sink_pad.add_probe(Gst.PadProbeType.BUFFER, tiler_sink_pad_buffer_probe, 0)

    # List the sources
    print("Now playing...")
    for i, source in enumerate(args[:-1]):
        if i != 0:
            print(i, ": ", source)

    print("Starting pipeline \n")
    # start play back and listed to events
    pipeline.set_state(Gst.State.PLAYING)
    try:
        loop.run()
    except:
        pass
    # cleanup
    print("Exiting app\n")
    pipeline.set_state(Gst.State.NULL)

• Hardware Platform: GPU
• DeepStream Version: 5.0
• NVIDIA GPU Driver Version (valid for GPU only): 460
• Issue Type: question

Fiona.Chen · February 25, 2021, 12:10am

Can this application work with deepstream sample models?

mai.algendy · February 25, 2021, 7:20am

Yeah, it does work with the config file for the sample primary detector model
(does detect and pass the crashing frame)

but idk if it’s the model because I got the Yolo to work when I changed the confidence last time

Fiona.Chen · March 2, 2021, 3:08am

Is there any problem with c sample deepstream-app with the modified confidence value? yolov4_deepstream/deepstream_yolov4 at master · NVIDIA-AI-IOT/yolov4_deepstream (github.com)

mai.algendy · March 2, 2021, 1:47pm

There’s no problem when using the c sample
I downloaded this project and changed it to use my Yolo model and the parsing function I usually use with it, and it ran all the way on all confidence levels (using the same input video)

I also tried to comment the call to my logic based on this and it still crashed at the same point

mai.algendy · March 10, 2021, 6:57am

this issue showed again on a different project with the same model and parsing function that worked fine in c++
I tried removing the function call to my logic and it still crashed on a specific frame number , and it stopped crashing when I pre-cluster-threshold to .5 from .2

I want to know why this is happening cause I fear it could happen with higher confidences on other streams/ videos
@bcao

ayanasser · March 14, 2021, 7:30am

@Fiona.Chen @kayccc Is there any update ?

ayanasser · March 15, 2021, 9:34am

I am facing a pretty much the same problem, but I am working on a different project and also the video that output this error was working fine before (by using KLT tracker) then i change the tracker to nvdcf and it worked fine too, then when I change back to KLT it didn’t work.

I dnt know if this is a problem by deepstream or a hardware problem ?

Amycao · April 2, 2021, 6:57am

Hi,
I used yolov4 etlt model and c version sample deepstream_tlt_apps to try to reproduce your issue, i tried all nms-iou-threshold values from 0.1 to 0.9
pre-cluster-threshold values from 0.1 to 0.9
but i can not reproduce your issue.

this issue showed again on a different project with the same model and parsing function that worked fine in c++
I tried removing the function call to my logic and it still crashed on a specific frame number , and it stopped crashing when I pre-cluster-threshold to .5 from .2

→ and how about result with pre-cluster-threshold set to .5 using your model with this different project ?

mai.algendy · April 8, 2021, 1:56pm

I shared my model with the user that commented right before you, the model was used on a different project and it crashed there too

also I had a similar issue with different projects before , with pre-cluster-threshold set to .5 or less

Amycao · April 21, 2021, 1:25am

Let’s list the case you tested:
1 project 1, your model, not work
2 project 1, builtin model, work
3 c project (yolov4), your model, work
4 project 2, your model, not work. stopped crashing when pre-cluster-threshold to .5 from .2
Since i can not repro your issue, please figure out the difference which caused the error.