NVIDIA gstreamer element segfaults with `Gst.debug_add_log_function` race condition

The following two pipelines achieve the same (but the second uses nvidia gst elements)

NO_NV="""
    filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4
    ! qtdemux
    ! h264parse
    ! avdec_h264
    ! xvimagesink
"""

NV="""
    filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4
    ! qtdemux
    ! h264parse
    ! nvv4l2decoder
    ! nvvideoconvert
    ! xvimagesink
"""
  • I’m runnning gst-launch from python’s subprocess, and they work fine.
  • They also work fine when calling them from python, using Gst.parse_launch.

I’m trying to attach a custom logger using Gst.debug_add_log_function (docs here), but the NV pipe produces a segfault when doing this. With GST_DEBUG=2:

Opening in BLOCKING MODE 
0:00:00.093462508 27987      0x554b750 WARN                    v4l2 gstv4l2object.c:4430:gst_v4l2_object_probe_caps:<nvv4l2decoder0:src> Failed to probe pixel aspect ratio with VIDIOC_CROPCAP: Unknown error -1
0:00:00.093555696 27987      0x554b750 WARN                    v4l2 gstv4l2object.c:2372:gst_v4l2_object_add_interlace_mode:0x54e1090 Failed to determine interlace mode
0:00:00.094160457 27987      0x554b750 WARN                 basesrc gstbasesrc.c:3583:gst_base_src_start_complete:<filesrc0> pad not activated yet
0:00:00.094888999 27987      0x55500f0 WARN                 qtdemux qtdemux_types.c:233:qtdemux_type_get: unknown QuickTime node type sgpd
0:00:00.094950666 27987      0x55500f0 WARN                 qtdemux qtdemux_types.c:233:qtdemux_type_get: unknown QuickTime node type sbgp
0:00:00.095009644 27987      0x55500f0 WARN                 qtdemux qtdemux.c:3031:qtdemux_parse_trex:<qtdemux0> failed to find fragment defaults for stream 1
0:00:00.095199828 27987      0x55500f0 WARN                 qtdemux qtdemux.c:3031:qtdemux_parse_trex:<qtdemux0> failed to find fragment defaults for stream 2
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 


{'user_data': (), 'message': <Gst.DebugMessage object at 0x7f950a9e58 (void at 0x7f89d17828)>, 'source': <Gst.Pad object at 0x7f950ac0d8 (GstPad at 0x545ad30)>, 'dline': 4430, 'dfctn': 'gst_v4l2_object_probe_caps', 'dfile': '/dvs/git/dirty/git-master_linux/3rdparty/gst/gst-v4l2/gst-v4l2/gstv4l2object.c', 'level': <enum GST_LEVEL_WARNING of type Gst.DebugLevel>, 'category': <Gst.DebugCategory object at 0x7f9509fc78 (void at 0x522fba0)>}

0:00:00.206069582 27987      0x55500f0 WARN                    v4l2 gstv4l2object.c:4430:gst_v4l2_object_probe_caps:<nvv4l2decoder0:src> Failed to probe pixel aspect ratio with VIDIOC_CROPCAP: Unknown error -1
sys:1: Warning: g_object_is_floating: assertion 'G_IS_OBJECT (object)' failed
sys:1: Warning: g_object_get_qdata: assertion 'G_IS_OBJECT (object)' failed
Segmentation fault (core dumped)
  • I need to monitor these debug logs to parse some messages which are not posted to the bus (which would be parseable inside bus_call, see code below).
  • The crash only happens when using the NV pipe, which uses nvidias gstreamer elements.
  • Note that the callback gets called at least once, before the crash.
  • GST_DEBUG<=1 obviously avoids the segfault, but also kills my custom logger.
  • The behavior of the log function used does not matter (a noop or a lambda also produces the segfault)

To avoid it I can

  1. sleep > 200 [msec] after the pipeline state is set to Gst.State.PLAYING. Tthe limit seems to be around 108 [ms] for a system without external load: when running the same command over and over again, sleeping this limit ammmount of time, it will sometines crash. Sleeping more than this “guarantees” the crash won’t happen.
  2. Add log from a callback, eg using a GObject.timeout to run it after eg 1 sec (running it with timeout=0 wil sometimes also produce the crash). This skips the the first message which would be received if the log was added in the main thread and before settings the pipeline’s state to Gst.State.PLAYING
  3. In both cases the log does get called as expected for subsequent result (I loose whatever would be messaged before the delay though)

Questions

  1. Is this expected behavior? Why does this happen only when using nvidia’s elements?
  2. Is the current workaround robust (Option 2)? Are there any alternatives to this “solution”?

Code

#!/usr/bin/env python3

import shlex
import subprocess
import sys

import gi
from gi.repository import GLib
from gi.repository import GObject
gi.require_version("Gst", "1.0")
from gi.repository import Gst

NO_NV="""
    filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4
    ! qtdemux
    ! h264parse
    ! avdec_h264
    ! xvimagesink
"""

NV="""
    filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4
    ! qtdemux
    ! h264parse
    ! nvv4l2decoder
    ! nvvideoconvert
    ! xvimagesink
"""

def on_message(category, level, dfile, dfctn, dline, source, message, *user_data):
    print(f"\n\n{locals()}\n")
    return True

def bus_call(bus, message, loop):
    t = message.type
    if t == Gst.MessageType.EOS:
        sys.stdout.write("End-of-stream\n")
        loop.quit()
    elif t == Gst.MessageType.ERROR:
        err, debug = message.parse_error()
        sys.stderr.write("Error: %s: %s\n" % (err, debug))
        loop.quit()
    return True

def add_log():
    Gst.debug_add_log_function(on_message)
    return True

def python(pipe, mode):

    Gst.init(None)
    GObject.threads_init()
    mainloop = GObject.MainLoop()

    pipeline = Gst.parse_launch(pipe)
    bus = pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect ("message", bus_call, mainloop)
    pipeline.set_state(Gst.State.PLAYING)
    if mode=="main":
        add_log()
    elif mode=="sleep":
        from time import sleep
        sleep(0.2)
        add_log()
    elif mode=="callback":
        GObject.timeout_add_seconds(1, add_log)
    else:
        print("Not adding custom log function")

    mainloop.run()

def gstlaunch(pipe):
    return subprocess.check_call(shlex.split(f"gst-launch-1.0 {pipe}"))


if __name__ == "__main__":
    # gstlaunch(NO_NV)  # This runs ok
    # gstlaunch(NV)  # This runs ok
    # python(NO_NV, "nolog")  # This runs ok
    # python(NV, "nolog") # This runs ok
    # python(NO_NV, "main") # This runs ok
    python(NV, "main")  # This crashes
    # python(NV, "sleep") # This runs OK
    # python(NV, "callback") # This runs OK

HW

• Hardware Platform (Jetson / GPU) Jetson Xavier
• DeepStream Version 5
• JetPack Version (valid for Jetson only) 4.4
• TensorRT Version 7.1

I’ve tried your python script with T4 and Jetson NX board, the script works well.

Please provide complete information as applicable to your setup.

• Hardware Platform
• DeepStream Version
• JetPack Version
**• TensorRT Version **
• Issue Type(bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

BTW, it is better to add “pipeline.set_state(Gst.State.NULL)” after “mainloop.run()”.

Tried same script, now using another Xavier (specs below) and it works ok. Im goint to check back on monday with the original device.

  • NVIDIA Jetson AGX Xavier [16GB]
    • Jetpack 4.4 [L4T 32.4.3]
    • CUDA: 10.2.89
    • cuDNN: 8.0.0.180
    • TensorRT: 7.1.3.0

Tried with new Jetson, code checks out (only difference is a full system reboot).

If it appears again, I’ll comment here…