Smart-record property killing the pipeline after a certain time

debjit.adak · March 20, 2025, 8:05am

Please provide complete information as applicable to your setup.

**• Hardware Platform -------------> GPU
• DeepStream Version -----------> 7.0
• TensorRT Version --------------> 8.5
**• NVIDIA GPU Driver Version ----> 535.230.12

We are running 80 cams (4 processes 20 cameras) with L4 machine.After run for certain time the process got killed. Segmentation fault

Each camera we are getting 25 fps and and taking out data using pyds.

The moment we set the

uri_decode_bin = Gst.ElementFactory.make("nvurisrcbin", "nvurisrcbin")
base_path = f"/opt/nvidia/deepstream/deepstream-7.0/nvodin24/video/{index}_{uuid.uuid4().hex[:8]}"
uri_decode_bin.set_property("smart-record", 2)
os.makedirs(base_path, exist_ok=True)
uri_decode_bin.set_property("smart-rec-dir-path", base_path)
uri_decode_bin.set_property("smart-rec-cache", 20)

we start getting the critical error
(python3:35201): GStreamer-CRITICAL **: 07:41:30.684: gst_buffer_get_size: assertion ‘GST_IS_BUFFER (buffer)’ failed

after some time the process starting with coredump

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--c
Core was generated by `python3 test_process.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __pthread_kill_implementation (no_tid=0, signo=11, threadid=127620748412480) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x741205600640 (LWP 1426874))]
(gdb) bt full 
#0  __pthread_kill_implementation (no_tid=0, signo=11, threadid=127620748412480) at ./nptl/pthread_kill.c:44
        tid = <optimized out>
        ret = 0
        pd = 0x741205600640
        old_mask = {__val = {243, 1613130407189148448, 127637175821872, 289, 98534734174688, 127620748412352, 2, 11, 98534734174688, 98534731873558, 98534731752894, 98534729810424, 0, 47244640256, 123, 0}}
        ret = <optimized out>
        pd = <optimized out>
        old_mask = <optimized out>
        ret = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
        resultvar = <optimized out>
        resultvar = <optimized out>
        __arg3 = <optimized out>
        __arg2 = <optimized out>
        __arg1 = <optimized out>
        _a3 = <optimized out>
        _a2 = <optimized out>
        _a1 = <optimized out>
        __futex = <optimized out>
        resultvar = <optimized out>
        __arg3 = <optimized out>
        __arg2 = <optimized out>
        __arg1 = <optimized out>
        _a3 = <optimized out>
        _a2 = <optimized out>
        _a1 = <optimized out>
        __futex = <optimized out>
        __private = <optimized out>
        __oldval = <optimized out>
        result = <optimized out>
#1  __pthread_kill_internal (signo=11, threadid=127620748412480) at ./nptl/pthread_kill.c:78
No locals.
#2  __GI___pthread_kill (threadid=127620748412480, signo=signo@entry=11) at ./nptl/pthread_kill.c:89
No locals.
#3  0x00007415d896f476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
        ret = <optimized out>
#4  <signal handler called>
No locals.
#5  0x00007415d7cc7f17 in gst_buffer_get_size () from /lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
No symbol table info available.
#6  0x00007415d7ccda68 in gst_buffer_copy_into () from /lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
No symbol table info available.
#7  0x0000741557a651d6 in ?? () from /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libgstnvvideo4linux2.so
No symbol table info available.
#8  0x00007415d7d371d7 in ?? () from /lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
No symbol table info available.
#9  0x00007415d7bbf384 in g_thread_pool_thread_proxy (data=<optimized out>) at ../glib/gthreadpool.c:350
        task = 0x7410dc00b620
        pool = <optimized out>
#10 0x00007415d7bbeac1 in g_thread_proxy (data=0x7414e80291a0) at ../glib/gthread.c:831
        thread = 0x7414e80291a0
        __func__ = "g_thread_proxy"
#11 0x00007415d89c1ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {127634990168848, -1021380715546781404, 127620748412480, 0, 127637177243600, 127634990169200, 1871062135877294372, 1871494816171579684}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#12 0x00007415d8a53850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

please help us out fix the issue.
@Fiona.Chen @fanzh @junshengy

fanzh · March 20, 2025, 8:52am

Are you testing in docker? which sample are you testing or referring to? what is the complete media pipeline? did all four processes crash at the same time or only one crash? how long did it run before getting killed? if disabling smart-record, will the app crash? Thanks!

debjit.adak · March 20, 2025, 9:03am

We are using “Docker”
It’s our code, kind of deepstream-test3
nvurisrcbin -->streammux—>queue —>infer—>tracker---->analytics ---->queue —>appsink
No, one process getting killed after certain time one more getting killed
It’s 45min one process is getting down
Without smart-record all process is running

You can see above two graph when process got killed their is drop in memory and also at the same instance spike in I/O Wait

@snehashish.debnath @s.Jagannath

fanzh · March 20, 2025, 9:20am

From the crash stack, it is related to decoder. what are the resolution, fps and video codec of rtsp source? could you share some logs of “nvidia-smi dmon” when running the app? wondering the utilization of decoding.

debjit.adak · March 20, 2025, 9:36am

Frame size is (1920,1080) —> 25 FPS
video codec is H265
Attaching our nvidia-smi dmon stats
nvidia_smart_record_dmon.zip (10.2 KB)

Pls check the logs for GPU 0

Observations:- Using person model
For low Fps cameras this issue doesn’t happen (Not Sure).
When the FPS is 25 and there are a lot of objects like min 20~30 object in per frame per camera its more evident.

What do u think can be the issue ?
L4_output_gdb.zip (74.4 MB)

Giving you gdb output also inside this you can see element wise latency !

fanzh · March 21, 2025, 7:30am

AYK, smart-reocrd is used to record the encoded stream. many objects is not related to smart-record.
what did you do in appsink? To narrow down this issue, please use fakesink instead.
many objects will affect the GPU utilizaton of inference and tracker. As of now, there is no GPU utilzatision log when the app crashes. could you run “nvidia-smi dmon -o T >1.log” when running the app? then please share the 1.log and app log when the app crashes.

debjit.adak · March 21, 2025, 7:54am

@fanzh
We will share with you 1.log. Could you clarify what information contain app log ?

fanzh · March 21, 2025, 7:57am

the applicaton running log on the terminal, wondering if there are some error tips.

debjit.adak · March 24, 2025, 4:00am

Hi @fanzh

Currently we are running with “fakesink” instate of “appsink”. But same, process kill is happening after certain time.
I am attaching zip file, file contain 1.log, docker stats graph, python code for pipeline,and related files which will help to run the code, Could you check the code and reproduce the issue from your end ?

It will be really great full if you point out where are we doing mistake with respect to the pipeline !

nvidia_forums_code_24Mar.zip (5.0 MB)

fanzh · March 24, 2025, 9:54am

Thanks for the sharing! there is no the applicaton running log, we don’t know when the applicaton was killed. From the log of nvidia-smi, sometime the GPU utilization is close to 100%. please refer to this link for performance improvement.

01:01:57      1     66     76      -     99     94      0     64      0      0   6250   1425
01:05:14      0     64     69      -    100     75      0     50      0      0   6250   1545
01:29:05      0     63     69      -     97     93      0     69      0      0   6250   1365 
02:28:44      0     63     69      -     97     94      0     63      0      0   6250   1320

debjit.adak · March 24, 2025, 10:23am

As we have provided you the docker stats graph, if you notice properly after a day their is a certain drop in the docker memory, so what we have seen is that drop corresponds to killing of the pipeline and we getting core dump issue. I am attaching core dump timestamp bellow picture

Again please try to reproduce the issue your end with the provided code. We will be attaching the app log soon.

Even if you say GPU utilisation is close to 100%, let not forget this issue only comes when smart-record is enable and in the 1.log 100 came only one time, so we negate it !

ERROR :-

Program terminated with signal SIGSEGV, Segmentation fault.
#0  __pthread_kill_implementation (no_tid=0, signo=11, threadid=140696631641664) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x7ff67cc00640 (LWP 1644217))]
(gdb) bt full 
#0  __pthread_kill_implementation (no_tid=0, signo=11, threadid=140696631641664) at ./nptl/pthread_kill.c:44
        tid = <optimized out>
        ret = 0
        pd = 0x7ff67cc00640
        old_mask = {__val = {2175, 9266090568899942233, 140737349464432, 349, 93824997943776, 140696631641536, 2, 11, 93824997943776, 93824995642646, 93824995521982, 93824993579512, 0, 47244640281, 
            18446744073709551615, 0}}
        ret = <optimized out>
        pd = <optimized out>
        old_mask = <optimized out>
        ret = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
        resultvar = <optimized out>
        resultvar = <optimized out>
        __arg3 = <optimized out>
        __arg2 = <optimized out>
        __arg1 = <optimized out>
        _a3 = <optimized out>
        _a2 = <optimized out>
        _a1 = <optimized out>
        __futex = <optimized out>
        resultvar = <optimized out>
        __arg3 = <optimized out>
        __arg2 = <optimized out>
        __arg1 = <optimized out>
        _a3 = <optimized out>
        _a2 = <optimized out>
        _a1 = <optimized out>
        __futex = <optimized out>
        __private = <optimized out>
        __oldval = <optimized out>
        result = <optimized out>
#1  __pthread_kill_internal (signo=11, threadid=140696631641664) at ./nptl/pthread_kill.c:78
No locals.
#2  __GI___pthread_kill (threadid=140696631641664, signo=signo@entry=11) at ./nptl/pthread_kill.c:89
No locals.
#3  0x00007ffff7c92476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
        ret = <optimized out>
#4  <signal handler called>
No locals.
#5  0x00007fff8baffebc in gst_buffer_copy_into () from /lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
No symbol table info available.
#6  0x00007fff90c461d6 in ?? () from /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libgstnvvideo4linux2.so
No symbol table info available.
#7  0x00007fff8bb691d7 in ?? () from /lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
No symbol table info available.
#8  0x00007fff90142384 in g_thread_pool_thread_proxy (data=<optimized out>) at ../glib/gthreadpool.c:350
        task = 0x7ff3884a3050
        pool = <optimized out>
#9  0x00007fff90141ac1 in g_thread_proxy (data=0x7ff9f0041650) at ../glib/gthread.c:831
        thread = 0x7ff9f0041650
        __func__ = "g_thread_proxy"
#10 0x00007ffff7ce4ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140711601109776, -4543113795256716698, 140696631641664, 0, 140737350879184, 140711601110128, 4548467867175875174, 4543130980051448422}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#11 0x00007ffff7d76850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

debjit.adak · March 24, 2025, 11:40am

Now I’m attaching app log also with you
219_gdb_ouput_24Mar.zip (84.3 MB)

debjit.adak · March 24, 2025, 2:29pm

Hi @Fiona.Chen

I hope you are doing well. We are facing above problem as you have L4 machine, we are also using L4,
could you please check the code and run it on our machine reproduce the issue ? It’s very important for us to know !
I am sharing all the necessary file which require to run the code !
nvidia_24march_all_logs.zip (3.5 MB)

Thank you.

fanzh · March 24, 2025, 3:22pm

let’s narrow down this issue first.

From the crash stack, the app crashed in decoder libgstnvvideo4linux2.so. From the nvidia-smi, sometimes the GPU utilization is close to 100%, which is abnormal. If disabling infer, tracker, analytics, will the issue persist?
If you suppose smart-record is the root cause, to narrow down this issue, could you check if the pipeline “nvurisrcbin —>fakesink” with smart-record will crash.

debjit.adak · March 25, 2025, 4:08am

We understand this are following results the experiment you suggested (nvurisrcbin —>streammux —>fakesink)

After disabling infer,tracker , analytics the GPU utilisation was never close to 100.
As you pointed out to understand smart-record is the root coz we ran above pipeline and it crash.

Below are the app log, 1.log , docker stats visualisation graph.

L4_output_gdb_25Mar.zip (9.7 MB)

fanzh · March 25, 2025, 5:28am

Do you mean some processes still crashed when using “nvurisrcbin —>streammux —>fakesink”? if so, could you share test code? I will try to reproduce. is the crash stack the same with the stack in the issue description?

debjit.adak · March 25, 2025, 6:36am

Yes, some processes still crashed when using “nvurisrcbin —>streammux —>fakesink".
Pls find the below attached code below.
Yes the crash stack are the same as in the issue description. Please look in the core-dump if it crash for u to.

nvurisrcbin_to_fakesink.zip (5.3 KB)

fanzh · March 25, 2025, 8:00am

are you testing in DS7.0 docker container? the TRT verson of DS7.0 should be 8.6.1. please make sure the components version meet the requirement of this table.
In the shared code simple_pipeline.py, Noticing the code did not call start-sr to start recording, please make sure pyds v1.1.11 is installed for DS7.0.
to rull out the smart record, could you remove the following code in simple_pipeline.py to check if the app still will crash? the default smart-record model is 0(disabled).

    uri_decode_bin.set_property("smart-record", 2)
    os.makedirs(base_path, exist_ok=True)
    uri_decode_bin.set_property("smart-rec-dir-path", base_path)
    uri_decode_bin.set_property("smart-rec-cache", 20)

debjit.adak · March 25, 2025, 8:22am

1 and 2. I think all the dependency are as it should be attaching a screenshot for the same.

I think we are over looking the fact that the moment we are setting the smart-record property in the simple_pipeline.py the crash happens . If we don’t set smart-record property i.e if the code is not there ( 1. nvurisrcbin -->streammux—>queue —>infer—>tracker---->analytics ---->queue —>fakesink/appsink) it does not fail.

Screenshot from 2025-03-25 13-45-131926×691 183 KB

Any ways we will do to rull out the smart record, could you remove the following code in simple_pipeline.py to check if the app still will crash? the default smart-record model is 0(disabled). as u suggested

Are u also facing the same issue when u reproduce?

fanzh · March 26, 2025, 2:42am

duplicate with this topic.
after you tested “nvurisrcbin —>streammux —>fakesink” with smart record, Please provide kernel log and docker service log to determine why the program was killed. Thanks!

sudo dmesg  > kernel.log
sudo journalctl -u docker.service > docker.log

Topic		Replies	Views
Smart recording on single GPU DeepStream SDK deepstream	48	551	May 26, 2025
Smart Recording on GPU-1 Also Loads Processes on GPU-0 DeepStream SDK deepstream	27	383	May 22, 2025
Memory leak when I put for recording (smart recording) DeepStream SDK	47	792	January 8, 2025
Deepstream pipeline after sometime not running pipeline itself stopped DeepStream SDK deepstream	14	252	April 14, 2025
Jetson AGX Orin (DeepStream 7.1) – Smart Record / custom recording triggers NvBufSurface CUDA faults and NVENC crashes with 13 RTSP sources DeepStream SDK tensorrt , cuda , ubuntu , gstreamer , deepstream , jetson-orin	23	329	December 17, 2025
DeepStream smart record memory leak (DS 8.0, RTX 5070 16GB) DeepStream SDK deepstream	8	64	January 27, 2026
Issue with smart recording duration DeepStream SDK deepstream	12	297	March 13, 2025
Smart Record issue - record files have strange grey blocks in the middle DeepStream SDK	17	1471	October 12, 2021
Memory continuously increase about 50mb/day, and pipeline crash DeepStream SDK gstreamer , python	9	1177	August 13, 2023
Could not link pipeline DeepStream SDK gstreamer	17	966	June 28, 2023

Smart-record property killing the pipeline after a certain time

Related topics