• Hardware Platform (GPU)
• DeepStream Version 7.1
• NVIDIA GPU Driver Version (valid for GPU only) 550, 560, 570
• Issue Type (bugs)
• How to reproduce the issue ? GitHub - lumeohq/deepstream-encoder-segfault
Hello!
We see frequent segfaults when running multiple pipelines in multiple threads in the same process. In some conditions, the program segfaults at 80% chance in the first 5 seconds. It is quite easily reproducible. We tested this on 2 different machines with different drivers. 535 driver is not affected, but 550, 560 and 570 are affected.
More details in the readme at GitHub. MRE contains a script that runs a simple C application using nvcr.io/nvidia/deepstream:7.1-triton-multiarch
docker image.
GDB backtrace:
Thread 32 "videotestsrc2:s" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x732e31000640 (LWP 222)]
0x0000732e71417941 in ?? () from /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
I used the 570 driver on 3070TI to test the case you provided, and it can exit normally without any crash.
wget https://us.download.nvidia.com/tesla/570.133.20/nvidia-driver-local-repo-ubuntu2204-570.133.20_1.0-1_amd64.deb
sudo dpkg -i nvidia-driver-local-repo-ubuntu2204-570.133.20_1.0-1_amd64.deb
sudo cp /var/nvidia-driver-local-repo-ubuntu2204-570.133.20/nvidia-driver-local-6AA56764-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt install cuda-drivers
However, 3070TI has a limit on the number of encoder instances. You can refer to this table
Are you testing on ubuntu2204? We have only tested DS-7.1 on ubuntu2204
We have Ubuntu 22.04 in prod, and on my dev machine it is Mint 22.1 (similar to Ubuntu 24.04), it crashes too. The driver is installed from Linux Mint driver manager: 570.144 (open).
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.144 Driver Version: 570.144 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3070 Ti Off | 00000000:0A:00.0 On | N/A |
| 0% 48C P8 28W / 310W | 1340MiB / 8192MiB | 8% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
I also tried installing using NVIDIA-Linux-x86_64-570.144.run
file downloaded from NVIDIA, and the test fails.
Have you tried running run.sh
multiple times? There’s some probability of failing, for instance I ran it 5 times:
Running in normal mode...
Exit code: 139
...
Running in normal mode...
Exit code: 0
...
Running in normal mode...
Exit code: 0
...
Running in normal mode...
Exit code: 0
...
Running in normal mode...
Exit code: 139
You can also run
sudo apt install python3-tabulate
./table.py
.. and leave for half an hour. If the resulting table is all zeros, then it works well on your setup.