Deepstream 6.0 nvv4l2decoder suddenly uses 100% CPU and crashes the application

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
GPU (A5000)
• DeepStream Version
Deepstream 6.0, 6.0.1, 6.2
• JetPack Version (valid for Jetson only)
• TensorRT Version
TensorRT 8.0.1
• NVIDIA GPU Driver Version (valid for GPU only)
Version 515.67
• Issue Type( questions, new requirements, bugs)
Potential Bug
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

We are using deepstream-app sample as a baseline for our product. Below is the configuration file we are using for the system :

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=30

[source1]
enable=1
type=6 # custom source type to accept h264 stream through tcp
uri=127.0.0.1
gpu-id=0
cudadec-memtype=2
nvbuf-memory-type=3
num-sources=100
drop-frame-interval=10
intra-decode-enable=0
low-latency-mode=0
num-extra-surfaces=0

[sink0]
enable=1
type=6
sync=0
source-id=0
msg-conv-config=config_msgconv.txt
msg-conv-payload-type=258
msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream/lib/libnvds_kafka_proto.so
msg-broker-conn-str=localhost;9092;test_topic
topic=test_topic
msg-broker-config=/opt/nvidia/deepstream/deepstream-6.0.appliance/sources/apps/sample_apps/deepstream-test4/cfg_kafka.txt
new-api=0
disable-msgconv=0

[sink2]
enable=1
type=1
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=3

[streammux]
gpu-id=0
live-source=1
batch-size=16
batched-push-timeout=30000
width=1280
height=720
enable-padding=0
nvbuf-memory-type=3

[primary-gie]

[tracker]

[secondary-gie1]
[secondary-gie2]
[secondary-gie3]
[secondary-gie4]
[secondary-gie5]

[ds-example]

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Description of environment setting:

  • A5000 GPU, 32-core CPU (Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz), 64GB RAM
  • Use TCP stream and tcp_server src plugin
  • 100 channels of H.264 streams HD at 30FPS
  • Use drop-frame-interval of 10 → 2.7-3 FPS per source
  • Primary detection model + multiple secondary classification models + ds_example post-processing + Kafka broker

Description of issue
The system is stable and runs without any issues for a couple of hours or sometimes for multiple hours with CPU usage of ~800-900% and ~14GB RAM. However, at some point the CPU usage shoots up to >2400~2500% within 3 seconds and stays at this level until system crashes and CPU usage goes down to 0%. Afterwards the TCP server is unable to connect to Deepstream sockets and send any frames. The behavior of Deepstream-app after this is two-fold : 1) it just hangs indefinitely, with perf_cb showing 0FPS for all channels 2) crashes completely.

When testing with 32 channels and lower drop-frame-interval, such behavior was never discovered, with the system running in a stable fashion for multiple days without outside intervention.

When analyzing the thread-wise CPU usage using htop, we discovered that nvv4l2decoder was using the vast majority of CPU resources upon crash. So, we believe nvv4l2decoder plugin to be the culprit in this issue.

We tested by adjusting num-extra-surfaces attribute of the plugin and found that the higher the num-extra-surfaces value is the faster the system crashes. However, even setting num-extra-surfaces to 0 does not eliminate CPU 100% problem and the system still crashes within 10-20 hours.

Upon searching for similar issues across this forum, we found a somewhat similar issue that had happened in an older version of Deepstream (version 4.0.1) for Jetson. However, we could not find any ideas or fixes that would apply in case of the most recent versions of dGPU Deepstream.

We can upload the GST_DEBUG_LEVEL = 4 logs as a separate file if needed.

1 Like

Have you monitored the memory usage too?

Are you using DeepStream 6.2?

For the source is customized. Can you check with other sources whether the issue can be reproduced?

Maybe you can write a simple app which only including the customization function which is just a tcp stream source playback to try the case.

Hello, Fiona
I’m working with @bolat.ashim in same project

Have you monitored the memory usage too?
There was no serious memory problem when the CPU 100%.

Are you using DeepStream 6.2?
We tested our solution by changing it to 6.0, 6.0.1, and 6.2, and the same problem occurred in all versions.
Now, we are using 6.0.1

For the source is customized. Can you check with other sources whether the issue can be reproduced?
Source stream is customized. There are custom header in streams. But after source_bin, Streams are converted normal TCP stream and we are use basic “tcpserversrc” gstreamer element.
We didn’t consider about other sources, because the requirements for using TCP streams cannot be changed.
If necessary, We can test using 100 channel of local videos

Maybe you can write a simple app which only including the customization function which is just a tcp stream source playback to try the case.
Now, we are waiting for the program to die. after that we can share the gst-debug=4 level logs when running whole pipeline.
and then, we can share the test result about simple app that contains only [source]

gst-log-level4.log (22.4 MB)

Here is the log when running whole pipeline before crash.
After CPU 100%, Deepstream is not dead, all tcp connections are down and it’s still hanging.

docker_stat_log_deepstream_container.txt (11.2 KB)
This is the “docker stats” log from the beginning of the deepstream to the death

The log looks fine. We need to identify whether the problem is caused by the customization or just the DeepStream internal implementation. Maybe you can write a simple app which only including the customization function which is just a tcp stream source playback to try whether the issue can be reproduced?

There is a system profiling tool Nsight Systems | NVIDIA Developer. Can you try to use the tool to check which threads and functions occupied the CPU when the CPU loading is overloaded?

User Guide :: Nsight Systems Documentation (nvidia.com)

I have experienced profiling using nsight before, but this is the first time I have done it this long.

After about few minutes, deep stream is automatically terminated and an nsight profile is created.
This was terminated with an error. Below is the backtrace result of the core file.

[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
Core was generated by `/usr/bin/deepstream-app -c samples/configs/deepstream-app/config_app_tcpsrc_a30’.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fd3141500a8 in ?? ()
from /opt/nvidia/nsight-systems-cli/2023.2.1/target-linux-x64/libToolsInjection64.so
[Current thread is 1 (Thread 0x7fd2ebd81700 (LWP 5669))]
(gdb) bt
#0 0x00007fd3141500a8 in ()
at /opt/nvidia/nsight-systems-cli/2023.2.1/target-linux-x64/libToolsInjection64.so
#1 0x00007fd313d78248 in ()
at /opt/nvidia/nsight-systems-cli/2023.2.1/target-linux-x64/libToolsInjection64.so
#2 0x00007fd3141790b3 in ()
at /opt/nvidia/nsight-systems-cli/2023.2.1/target-linux-x64/libToolsInjection64.so
#3 0x00007fd31415996e in ()
at /opt/nvidia/nsight-systems-cli/2023.2.1/target-linux-x64/libToolsInjection64.so
#4 0x00007fd31415ce44 in ()
at /opt/nvidia/nsight-systems-cli/2023.2.1/target-linux-x64/libToolsInjection64.so
#5 0x00007fd31417b992 in ()
at /opt/nvidia/nsight-systems-cli/2023.2.1/target-linux-x64/libToolsInjection64.so
#6 0x00007fd31418f264 in ()
at /opt/nvidia/nsight-systems-cli/2023.2.1/target-linux-x64/libToolsInjection64.so
#7 0x00007fd31418dade in ()
at /opt/nvidia/nsight-systems-cli/2023.2.1/target-linux-x64/libToolsInjection64.so
#8 0x00007fd314517790 in ()
at /opt/nvidia/nsight-systems-cli/2023.2.1/target-linux-x64/l—Type to continue, or q to quit—
ibToolsInjection64.so
#9 0x00007fd310cd26db in start_thread ()
at /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007fd31100b71f in clone ()
at /lib/x86_64-linux-gnu/libc.so.6

I used this command
nsys profile --force-overwrite true -s cpu --osrt-threshold=1000000000 --delay 20 -o deepstream deepstream-app -c samples/configs/deepstream-app/config_app_tcpsrc_a30_ashim.txt

If I don’t use “–osrt-threshold”, deepstream is very slow.
It’s not related with running time

If I use “-t cuda,nvtx”, nsys runs for a log time. but it dies same as above

Please share a guide to deepstream analysis for a long time

@BeomJun_Kim

The hardware decoder is overloaded.

A5000 is 5th generation Ampere GPU with 2 hardware deocder NVDEC cores.
Video Encode and Decode GPU Support Matrix | NVIDIA Developer

Even with 5 NVDEC cores, the A100 GPU can only support up to 83 h264 1080p@30fps streams decoding. Video Code SDK | NVIDIA Developer

However, when we tested on other testbeds, the environment was as follows.

100 channel(D1, 720x480), 15FPS

Even then, similar problems continued to occur.
Could the number of channels matter, not the size of the video?
Or is it difficult to decode even D1 in the A5000?

The really curious thing is that there are times when it runs for more than 24~30 hours without problems.

The same A5000 GPU? 100 x D1 @15fps may work on A5000. Can you get the kernel log after the case failed?

It’s difficult to share quickly.

We don’t have D1 testbed now.
But we’ll test about (HD, 30fps, 50~70), (HD, 15fps, 100).

Are there any benchmark that explain the limitations of HD streams?
ex> max number of HD 30fps streams per 1 NVDEC core

For A5000 only has two NVDEC cores, it may only support up to 37 1080p@30fps h264 streams.

The benchmark is in Video Code SDK | NVIDIA Developer

According to data on the Internet, the throughput of 720p is about twice as high as that of 1080p.

So, is 60 to 70 number of 720p videos okay in two NVDEC cores?
I can’t find about 720p NVDEC benchmark

Yes. It can.

Thanks! @Fiona.Chen

I’ll test it during weekend and share the result

@Fiona.Chen

When we test 50 * 720p * 30fps, it is not dead.
and 80 streams, it is not dead for 2days. I’ll look into it.

We thought 100ch was possible because it ran for up to 20 hours…

We are gonna test upper limit about 720p * 30fps.

Thanks.
It’s okay to close this issue.

Glad to hear that! But 100ch 720p@30fps may reach the upper limitation, it may be unstable.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.