Unstable deepstream rtsp pipeline - CUDA runtime error

Hardware Platform: Nvidia jetson orin nano dev kit (super)
DeepStream Version: 7.1
JetPack Version: 6.1 (rev1)
TensorRT Version: 10.3.0.30
CUDA: 12.6.68

Running deepstream python bindings with source cctv rtsp feed. The pipline is not stable, after some time of running inference it drops out.

Here is the error:

Frame Number= 434
FPS: 17.44
Frame Number= 435
FPS: 30.35
/dvs/git/dirty/git-master_linux/nvutils/nvbufsurftransform/nvbufsurftransform_copy.cpp:438: => Failed in mem copy

ERROR: Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:00:19.977832968  6123 0xaaaae1c20e40 WARN                 nvinfer gstnvinfer.cpp:1420:gst_nvinfer_input_queue_loop:<primary-inference> error: Failed to queue input batch for inferencing
Frame Number= 436
Error: gst-stream-error-quark: Failed to queue input batch for inferencing (1): /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1420): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline0/GstNvInfer:primary-inference
FPS: 62.61
libnvosd (1386):(ERROR) : cuGraphicsEGLRegisterImage failed : 700 
0:00:19.987670432  6123 0xaaaae1c20de0 WARN                 nvinfer gstnvinfer.cpp:2423:gst_nvinfer_output_loop:<primary-inference> error: Internal data stream error.
0:00:19.987721695  6123 0xaaaae1c20de0 WARN                 nvinfer gstnvinfer.cpp:2423:gst_nvinfer_output_loop:<primary-inference> error: streaming stopped, reason error (-5)
CUDA Runtime error cudaFreeHost(host_) # an illegal memory access was encountered, code = cudaErrorIllegalAddress [ 700 ] in file /dvs/git/dirty/git-master_linux/deepstream/sdk/src/utils/nvll_osd/memory.hpp:78
CUDA Runtime error cudaFree(device_) # an illegal memory access was encountered, code = cudaErrorIllegalAddress [ 700 ] in file /dvs/git/dirty/git-master_linux/deepstream/sdk/src/utils/nvll_osd/memory.hpp:79
CUDA Runtime error cudaFreeHost(host_) # an illegal memory access was encountered, code = cudaErrorIllegalAddress [ 700 ] in file /dvs/git/dirty/git-master_linux/deepstream/sdk/src/utils/nvll_osd/memory.hpp:78
CUDA Runtime error cudaFree(device_) # an illegal memory access was encountered, code = cudaErrorIllegalAddress [ 700 ] in file /dvs/git/dirty/git-master_linux/deepstream/sdk/src/utils/nvll_osd/memory.hpp:79
CUDA Runtime error cudaFreeHost(host_) # an illegal memory access was encountered, code = cudaErrorIllegalAddress [ 700 ] in file /dvs/git/dirty/git-master_linux/deepstream/sdk/src/utils/nvll_osd/memory.hpp:78
CUDA Runtime error cudaFree(device_) # an illegal memory access was encountered, code = cudaErrorIllegalAddress [ 700 ] in file /dvs/git/dirty/git-master_linux/deepstream/sdk/src/utils/nvll_osd/memory.hpp:79
CUDA Runtime error cudaFreeHost(host_) # an illegal memory access was encountered, code = cudaErrorIllegalAddress [ 700 ] in file /dvs/git/dirty/git-master_linux/deepstream/sdk/src/utils/nvll_osd/memory.hpp:78
CUDA Runtime error cudaFree(device_) # an illegal memory access was encountered, code = cudaErrorIllegalAddress [ 700 ] in file /dvs/git/dirty/git-master_linux/deepstream/sdk/src/utils/nvll_osd/memory.hpp:79
CUDA Runtime error cudaFreeHost(host_) # an illegal memory access was encountered, code = cudaErrorIllegalAddress [ 700 ] in file /dvs/git/dirty/git-master_linux/deepstream/sdk/src/utils/nvll_osd/memory.hpp:78
CUDA Runtime error cudaFree(device_) # an illegal memory access was encountered, code = cudaErrorIllegalAddress [ 700 ] in file /dvs/git/dirty/git-master_linux/deepstream/sdk/src/utils/nvll_osd/memory.hpp:79
CUDA Runtime error cudaFreeHost(host_) # an illegal memory access was encountered, code = cudaErrorIllegalAddress [ 700 ] in file /dvs/git/dirty/git-master_linux/deepstream/sdk/src/utils/nvll_osd/memory.hpp:78
CUDA Runtime error cudaFree(device_) # an illegal memory access was encountered, code = cudaErrorIllegalAddress [ 700 ] in file /dvs/git/dirty/git-master_linux/deepstream/sdk/src/utils/nvll_osd/memory.hpp:79

Here is the python code:

#!/usr/bin/env python3

import sys
sys.path.append("../")
from common.bus_call import bus_call
from common.platform_info import PlatformInfo
import pyds
import math
import time
import gi
gi.require_version("Gst", "1.0")
gi.require_version("GstRtspServer", "1.0")
from gi.repository import Gst, GstRtspServer, GLib
import argparse
#import alerts
#import alert

MAX_DISPLAY_LEN = 64
MUXER_OUTPUT_WIDTH = 640  
MUXER_OUTPUT_HEIGHT = 640  
MUXER_BATCH_TIMEOUT_USEC = 40000  
TILED_OUTPUT_WIDTH = 640
TILED_OUTPUT_HEIGHT = 640
GST_CAPS_FEATURES_NVMM = "memory:NVMM"
MIN_CONFIDENCE = 0.55  

last_frame_time = 0
fire_detected = False
consecutive_fire = 0  

def pgie_src_pad_buffer_probe(pad, info, u_data):
    global last_frame_time, fire_detected, consecutive_fire
    frame_number = 0
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        print("Unable to get GstBuffer ")
        return Gst.PadProbeReturn.DROP

    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    if not batch_meta:
        print("Unable to get batch metadata")
        return Gst.PadProbeReturn.DROP

    l_frame = batch_meta.frame_meta_list
    current_time = time.time()

    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break

        frame_number = frame_meta.frame_num
        print("Frame Number=", frame_number)

        if last_frame_time != 0:
            fps = 1.0 / (current_time - last_frame_time)
            print(f"FPS: {fps:.2f}")
        last_frame_time = current_time

        l_obj = frame_meta.obj_meta_list
        fire_in_frame = False  # Track if fire with sufficient confidence is detected

        while l_obj is not None:
            try:
                obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
            except StopIteration:
                break

            class_id = obj_meta.class_id
            confidence = obj_meta.confidence
            label = "Object"
            if class_id == 0:  
                label = "Fire"
                if confidence >= MIN_CONFIDENCE:
                    fire_in_frame = True  
            print(f"Detected: {label} with Confidence: {confidence:.2f}")

            try:
                l_obj = l_obj.next
            except StopIteration:
                break

        if fire_in_frame:
            consecutive_fire += 1
            print(f"Consecutive fire detections (confidence >= {MIN_CONFIDENCE}): {consecutive_fire}")
        else:
            consecutive_fire = 0  

        if consecutive_fire >= 30 and not fire_detected:
            #alerts.trig_on()
            #alert.trigger_pin1()
            fire_detected = True
            print(f"Fire confirmed after 10 consecutive detections with confidence >= {MIN_CONFIDENCE}, activating buzzer and red LED")
            time.sleep(2)
            #fire_detected = False
            #consecutive_fire = 0
            
        elif not fire_in_frame and fire_detected:
            #alerts.trig_off()
            #alert.trigger_pin2()
            fire_detected = False
            consecutive_fire = 0  
            print("Fire no longer detected or confidence too low, reverting to green LED on")

        try:
            l_frame = l_frame.next
        except StopIteration:
            break
    return Gst.PadProbeReturn.OK

def cb_newpad(decodebin, decoder_src_pad, data):
    print("In cb_newpad\n")
    caps = decoder_src_pad.get_current_caps()
    gststruct = caps.get_structure(0)
    gstname = gststruct.get_name()
    source_bin = data
    features = caps.get_features(0)

    print("gstname=", gstname)
    if gstname.find("video") != -1:
        print("features=", features)
        if features.contains("memory:NVMM"):
            bin_ghost_pad = source_bin.get_static_pad("src")
            if not bin_ghost_pad.set_target(decoder_src_pad):
                sys.stderr.write("Failed to link decoder src pad to source bin ghost pad\n")
        else:
            sys.stderr.write("Error: Decodebin did not pick nvidia decoder plugin.\n")

def decodebin_child_added(child_proxy, Object, name, user_data):
    print("Decodebin child added:", name, "\n")
    if name.find("decodebin") != -1:
        Object.connect("child-added", decodebin_child_added, user_data)
    if name.find("source") != -1:
        Object.set_property("timeout", 30)

def create_source_bin(index, uri):
    print("Creating source bin")
    bin_name = "source-bin-%02d" % index
    print(bin_name)
    nbin = Gst.Bin.new(bin_name)
    if not nbin:
        sys.stderr.write(" Unable to create source bin \n")

    uri_decode_bin = Gst.ElementFactory.make("uridecodebin", "uri-decode-bin")
    if not uri_decode_bin:
        sys.stderr.write(" Unable to create uri decode bin \n")
    uri_decode_bin.set_property("uri", uri)
    uri_decode_bin.connect("pad-added", cb_newpad, nbin)
    uri_decode_bin.connect("child-added", decodebin_child_added, nbin)

    Gst.Bin.add(nbin, uri_decode_bin)
    bin_pad = nbin.add_pad(Gst.GhostPad.new_no_target("src", Gst.PadDirection.SRC))
    if not bin_pad:
        sys.stderr.write(" Failed to add ghost pad in source bin \n")
        return None
    return nbin

def main(args):
    global fire_detected, consecutive_fire
    number_sources = len(args)
    platform_info = PlatformInfo()
    Gst.init(None)
    #alerts.trig_off()
    fire_detected = False  
    consecutive_fire = 0   

    print("Creating Pipeline \n ")
    pipeline = Gst.Pipeline()
    if not pipeline:
        sys.stderr.write(" Unable to create Pipeline \n")

    print("Creating streamux \n ")
    streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
    if not streammux:
        sys.stderr.write(" Unable to create NvStreamMux \n")

    pipeline.add(streammux)
    for i in range(number_sources):
        print("Creating source_bin ", i, " \n ")
        uri_name = args[i]
        source_bin = create_source_bin(i, uri_name)
        if not source_bin:
            sys.stderr.write("Unable to create source bin \n")
        pipeline.add(source_bin)
        padname = "sink_%u" % i
        sinkpad = streammux.request_pad_simple(padname)
        if not sinkpad:
            sys.stderr.write("Unable to create sink pad bin \n")
        srcpad = source_bin.get_static_pad("src")
        if not srcpad:
            sys.stderr.write("Unable to create src pad bin \n")
        srcpad.link(sinkpad)

    print("Creating Pgie \n ")
    pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
    if not pgie:
        sys.stderr.write(" Unable to create pgie \n")

    print("Creating nvvidconv \n ")
    nvvidconv = Gst.ElementFactory.make("nvvideoconvert", "convertor")
    if not nvvidconv:
        sys.stderr.write(" Unable to create nvvidconv \n")

    print("Creating tiler \n ")
    tiler = Gst.ElementFactory.make("nvmultistreamtiler", "nvtiler")
    if not tiler:
        sys.stderr.write(" Unable to create tiler \n")

    print("Creating nvosd \n ")
    nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
    if not nvosd:
        sys.stderr.write(" Unable to create nvosd \n")

    nvvidconv_postosd = Gst.ElementFactory.make("nvvideoconvert", "convertor_postosd")
    if not nvvidconv_postosd:
        sys.stderr.write(" Unable to create nvvidconv_postosd \n")

    caps = Gst.ElementFactory.make("capsfilter", "filter")
    if not caps:
        sys.stderr.write(" Unable to create capsfilter \n")
    caps.set_property("caps", Gst.Caps.from_string("video/x-raw, format=I420"))

    if codec == "H264":
        encoder = Gst.ElementFactory.make("x264enc", "encoder")
        print("Creating H264 Encoder (software)")
    elif codec == "H265":
        encoder = Gst.ElementFactory.make("x265enc", "encoder")
        print("Creating H265 Encoder (software)")
    if not encoder:
        sys.stderr.write(" Unable to create encoder \n")
        sys.exit(1)
    encoder.set_property("bitrate", bitrate)
    encoder.set_property("speed-preset", "ultrafast")

    if codec == "H264":
        rtppay = Gst.ElementFactory.make("rtph264pay", "rtppay")
        print("Creating H264 rtppay")
    elif codec == "H265":
        rtppay = Gst.ElementFactory.make("rtph265pay", "rtppay")
        print("Creating H265 rtppay")
    if not rtppay:
        sys.stderr.write(" Unable to create rtppay \n")
        sys.exit(1)

    sink = Gst.ElementFactory.make("udpsink", "udpsink")
    if not sink:
        sys.stderr.write(" Unable to create udpsink \n")
        sys.exit(1)
    sink.set_property("host", "127.0.0.1")  # Changed to localhost for stability
    sink.set_property("port", 5400)
    sink.set_property("async", False)
    sink.set_property("sync", 1)

    streammux.set_property("width", MUXER_OUTPUT_WIDTH)
    streammux.set_property("height", MUXER_OUTPUT_HEIGHT)
    streammux.set_property("batch-size", 1)  # Force single stream to reduce load
    streammux.set_property("batched-push-timeout", MUXER_BATCH_TIMEOUT_USEC)
    streammux.set_property("live-source", 1)

    pgie.set_property("config-file-path", "/opt/nvidia/deepstream/deepstream-7.1/sources/deepstream_python_apps/apps/deepstream-rtsp-in-rtsp-out/dstest1_pgie_config.txt")
    pgie.set_property("batch-size", 1)  # Match streammux batch-size

    tiler_rows = int(math.sqrt(number_sources))
    tiler_columns = int(math.ceil((1.0 * number_sources) / tiler_rows))
    tiler.set_property("rows", tiler_rows)
    tiler.set_property("columns", tiler_columns)
    tiler.set_property("width", TILED_OUTPUT_WIDTH)
    tiler.set_property("height", TILED_OUTPUT_HEIGHT)

    print("Adding elements to Pipeline \n")
    pipeline.add(pgie)
    pipeline.add(nvvidconv)
    pipeline.add(tiler)
    pipeline.add(nvosd)
    pipeline.add(nvvidconv_postosd)
    pipeline.add(caps)
    pipeline.add(encoder)
    pipeline.add(rtppay)
    pipeline.add(sink)

    streammux.link(pgie)
    pgie.link(tiler)
    tiler.link(nvvidconv)
    nvvidconv.link(nvosd)
    nvosd.link(nvvidconv_postosd)
    nvvidconv_postosd.link(caps)
    caps.link(encoder)
    encoder.link(rtppay)
    rtppay.link(sink)

    loop = GLib.MainLoop()
    bus = pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect("message", bus_call, loop)

    pgie_src_pad = pgie.get_static_pad("src")
    if not pgie_src_pad:
        sys.stderr.write(" Unable to get src pad \n")
    else:
        pgie_src_pad.add_probe(Gst.PadProbeType.BUFFER, pgie_src_pad_buffer_probe, 0)

    rtsp_port_num = 8554
    server = GstRtspServer.RTSPServer.new()
    server.props.service = "%d" % rtsp_port_num
    server.attach(None)

    factory = GstRtspServer.RTSPMediaFactory.new()
    factory.set_launch(
        '( udpsrc name=pay0 port=5400 buffer-size=524288 caps="application/x-rtp, media=video, clock-rate=90000, encoding-name=(string)%s, payload=96 " )'
        % codec
    )
    factory.set_shared(True)
    server.get_mount_points().add_factory("/ds-test", factory)

    print("\n *** DeepStream: Launched RTSP Streaming at rtsp://localhost:%d/ds-test ***\n\n" % rtsp_port_num)

    print("Starting pipeline \n")
    pipeline.set_state(Gst.State.PLAYING)
    try:
        loop.run()
    except:
        pass
    pipeline.set_state(Gst.State.NULL)
    #alerts.cleanup()
    print("GPIO cleaned up")

def parse_args():
    parser = argparse.ArgumentParser(description='RTSP Output Sample Application Help ')
    parser.add_argument("-i", "--input", help="Path to input RTSP stream", nargs="+", required=True)
    parser.add_argument("-c", "--codec", default="H264", help="RTSP Streaming Codec H264/H265", choices=['H264', 'H265'])
    parser.add_argument("-b", "--bitrate", default=2000, help="Set the encoding bitrate in kbps", type=int)  # Reduced bitrate
    args = parser.parse_args()
    global codec
    global bitrate
    codec = args.codec
    bitrate = args.bitrate
    return args.input

if __name__ == '__main__':
    stream_path = parse_args()
    sys.exit(main(stream_path))

Note: The same feed and models works well with deepstream app and it is stable for hours, even with multiple sources. The error is when run with python bindings and within few minutes it drops out, with single source only.

Can you upload “/opt/nvidia/deepstream/deepstream-7.1/sources/deepstream_python_apps/apps/deepstream-rtsp-in-rtsp-out/dstest1_pgie_config.txt” too? Is there any customized postprocessing?

What is your jetson orin nano’s memory size?

Can the original sample deepstream_python_apps/apps/deepstream-test4 at v1.2.0 · NVIDIA-AI-IOT/deepstream_python_apps work in your platform? Can you try your model with deepstream_python_apps/apps/deepstream-test4 at v1.2.0 · NVIDIA-AI-IOT/deepstream_python_apps without your pgie probe function first?

Thanks for your response.

Here is the config file

[property]
gpu-id=0
net-scale-factor=0.00392156862745098
onnx-file=/opt/nvidia/deepstream/deepstream-7.1/sources/deepstream_python_apps/apps/deepstream-test1-rtsp-out-orin/model.pt.onnx
model-engine-file=/opt/nvidia/deepstream/deepstream-7.1/sources/deepstream_python_apps/apps/deepstream-test1-rtsp-out-orin/model.engine
labelfile-path=/opt/nvidia/deepstream/deepstream-7.1/sources/deepstream_python_apps/apps/deepstream-test1-rtsp-out-orin/labels.txt
batch-size=1
process-mode=1
model-color-format=0
network-mode=1
num-detected-classes=1
interval=0
gie-unique-id=1
scaling-filter=0
scaling-compute-hw=0
cluster-mode=2

# Input configuration
#input-tensor-name=input
#input-tensor-width=640
#input-tensor-height=640
#maintain-aspect-ratio=1
#symmetric-padding=1
#network-type=0

#Output configuration
output-blob-names=output
output-tensor-meta=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=/home/supertest/DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

[class-attrs-all]
pre-cluster-threshold=0.2
topk=20
nms-iou-threshold=0.5

RAM is 8GB.

Im trying the test4 as per your suggestion, will test and post the results.

Can you monitor the memory usage while you running test4 sample?

Sure, working on it. Another point to add with respect to main code posted, the usage stats for this particular process shows GPU shared memory consumption of about 155 MB with total free memory of about 2.5 GB. Sometimes the program drops within a minute and sometimes it runs for more than 5-10 minutes. Both the cases memory consumption trends are identical and stable.

Works well, even with multiple sources.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.