Debug deepstream python crashes (segmentation fault)

• Hardware Platform (Jetson)
• DeepStream Version (5.1)
• JetPack Version (4.5.1)
• TensorRT Version (7.1.3)
• Issue Type (question)

We have a custom python app that is built using deepstream & gstreamer running on both Jetson NX and AGX platforms.

We have gradually been building more features into our software and have now run into random app crashes.

The system gets video feeds from CCTV cameras and thus is a live system.

Randomly the app crashes with no printout besides

"python3: ../../../../src/cairo-scaled-font.c:1326: cairo_scaled_font_destroy: Assertion `! scaled_font->cache_frozen' failed.
Aborted (core dumped)
"

How can we debug what is causing the crashes?

we ran a dmesg command and found the following in the printout that may help

[   15.569802] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[   15.569817] Bluetooth: BNEP socket layer initialized
[   16.067328] tegradc 15200000.nvdisplay: unblank
[   16.067338] tegradc 15210000.nvdisplay: blank - powerdown
[   16.067344] tegradc 15220000.nvdisplay: blank - powerdown
[   25.626300] nvmap_alloc_handle: PID 9446: python3: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant. 
[   36.190672] vdd-1v8-cvb: disabling
[   36.190681] vdd-1v8-sd: disabling
[   36.190686] vdd-epb-1v0: disabling
[   36.190691] avdd-cam-2v8: disabling
[   36.190696] vdd-sata-1v5: disabling
[   36.190701] vdd-1v8-slt: disabling
[   36.190705] vdd-3v3-slt: disabling
[   36.190710] vdd_sys_en: disabling
[ 1014.729475] nvmap_alloc_handle: PID 23448: python3: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant. 
[ 1105.211844] nvmap_alloc_handle: PID 24301: python3: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant. 
[ 6026.806841] tegradc 15200000.nvdisplay: blank - p

i will attatch the full file
log.txt (72.8 KB)

sometimes the system runs for days without crashing

any help or advise would be greatly appreciated

Regards Andrew

cairo is a third party public library for 2D graphics.https://www.cairographics.org/
Deepstream use the cairo APIs to write text and draw graphics such as circle, rectangle, …

You may need to check the whether your app try to draw things out of boundary(such as draw bbox out of the video boundry).

Thanks for the hint

Ill test and report back

I have tested and even if an objects label is too long, mianing it extends out of the bounds of the render window (not just its stream box) the program does not crash (usually it takes awhile for the program to crash)

This is just a suggestion. There is no clear clue to identify the reason.

Yes I understand

Are there any logs I can see more specific deepstream issues?

I cant seem to find anything in the gstreamer debug messages

The core dump means there is some operation happens without checking, so it is impossible to get the point by debug log. And the core dump happens in c code. You need to debug the core dump. There are lots of resources of core dump debugging in internet.

I really am struggling to determine the source of the issue.

So its possible that a python call could be causing the seg fault? (via the underlying C code of deepstream)

I have a crash log (from apport) which contains a core dump, once I have unpacked it the CoreDump is corrupted when i try and perform a back trace using gdb

_usr_sbin_nvphsd.0.crash (65.4 KB)

I have noticed it looks similar to this post system program problem detected: nvphsd - #3 by liu.jialu

the only real probe in my pipeline is on the tiler_sink_pad_probe, to read the metadata of objects and do some basic OSD changes like object labels, bounding box color.

We tried running the application through GDB in the hopes of uncovering the function/call

gdb python
(gdb) run /path/to/script.py
## wait for segfault ##
(gdb) backtrace
## stack trace of the c code

as per this link

queue8 is between nvvidconv and nvosd

    streammux.link(queue1)
    queue1.link(pgie)

    pgie.link(queue2)
    queue2.link(tracker)

    tracker.link(queue3)
    queue3.link(analytics)

    analytics.link(queue4)
    queue4.link(nvvidconv1)

    nvvidconv1.link(queue5)
    queue5.link(filter1)

    filter1.link(queue6)
    queue6.link(tiler)

    tiler.link(queue7)
    queue7.link(nvvidconv)

    nvvidconv.link(queue8)
    queue8.link(nvosd)
    if is_aarch64():
        nvosd.link(queue9)
        queue9.link(self.osd_interpipe_sink)
    else:
        nvosd.link(queue9)
        queue9.link(self.osd_interpipe_sink)

I hope someone can make more sense of this than I can…

Regards Andrew

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

There is still no useful information in gdb log