Multiple FFMpeg-Cuda-HLS-Transcoding Instances -> Deadlock Behavior

marcokittel · April 29, 2020, 8:25am

Hello,

i’m using a NVIDIA Quadro P2200 and the latest ubuntu linux to transcode multiple Multicast Streams into HLS. I tried different Versions of the NVidia Driver. The latest one and right now i’m on 440.33.01. Headless. The Transcoding works flawless with CPU Encoding and Libx264. But when i switch to cuda 2 of 3 processes will deadlock with a time. I opend three bash shells and made a screencast to show you the problem. At minute six you will notice that the first stream will stop doing anything. https://www.youtube.com/watch?v=QOaf7v_Gwwk

Because i’m using scale_npp, i build ffmpeg by my own:

./configure --enable-libx264 --enable-cuvid --enable-gpl --enable-libnpp --enable-cuda --disable-cuda-sdk --enable-nonfree --extra-cflags=-I/usr/local/cuda-10.2/include --extra-ldflags=-L/usr/local/cuda-10.2/lib64 && make -j 8

I tried different combinations of cuda and driver versions and the behavior was everywhere the same. I also tried different ffmpeg commands with the same result. How can i get HLS encoding working? One single transcoding process working. If i have more than one ffmpeg process all will fail until one single process who will still be working. With another words all the other transcoding processes seems to deadlocking. I looked into the thread stack with gdb -p pid, but it did not help. How to fix that issue?

ffmpeg version N-97495-g2594f6a362 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 9 (Ubuntu 9.2.1-9ubuntu2)
configuration: --enable-libx264 --enable-cuvid --enable-gpl --enable-libnpp --enable-cuda --disable-cuda-sdk --enable-nonfree --extra-cflags=-I/usr/local/cuda-10.2/include --extra-ldflags=-L/usr/local/cuda-10.2/lib64
libavutil 56. 43.100 / 56. 43.100
libavcodec 58. 82.100 / 58. 82.100
libavformat 58. 42.101 / 58. 42.101
libavdevice 58. 9.103 / 58. 9.103
libavfilter 7. 79.100 / 7. 79.100
libswscale 5. 6.101 / 5. 6.101
libswresample 3. 6.100 / 3. 6.100
libpostproc 55. 6.100 / 55. 6.100
Hyper fast Audio and Video encoder

exec $FFMPEG_PATH
-vsync 0
-loglevel debug
-threads:v 1
-threads:a 1
-filter_threads 1
-thread_queue_size 1024
-hwaccel cuda
-hwaccel_device 0
-hwaccel_output_format cuda
-deint adaptive
-i “udp://$MULTICAST_ADDRESS:$PORT”
-filter_complex “[v:0]split=4[temp1][temp2][source][temp3];[temp1]scale_npp=858:480[480p];[temp2]scale_npp=640:360[wide360p];[temp3]scale_npp=426:240[240p]”
-g 50 -sc_threshold 0
-map [wide360p]
-preset medium
-c:v:0 h264_nvenc
-preset fast
-profile:v baseline
-b:v:0 600k
-bufsize 24k
-minrate 400k -maxrate 600k
-map [480p]
-c:v:1 h264_nvenc
-preset medium
-profile:v baseline
-b:v:1 1000k
-bufsize 56k
-minrate 800k -maxrate 1600k
-preset fast
-map [source]
-c:v:2 h264_nvenc
-preset medium
-profile:v baseline
-preset fast
-b:v:2 3600k
-minrate 2000k -maxrate 4000k
-bufsize 144k
-map [240p]
-c:v:3 h264_nvenc
-preset medium
-profile:v baseline
-zerolatency 1
-preset fast
-b:v:3 400k
-bufsize 16k
-map a:0
-c:a aac
-b:a 128k
-ac 2
-map a:1
-c:a aac
-b:a 96k
-ac 2
-f hls
-hls_time 4
-hls_list_size 0
-hls_flags append_list
-hls_allow_cache 0
-hls_playlist_type event
-master_pl_name $MASTER_PLAYLIST_NAME
-var_stream_map “a:0,agroup:audio,default:yes,language:DEU a:1,agroup:audio,language:FR v:0,agroup:audio v:1,agroup:audio, v:2,agroup:audio, v:3,agroup:audio”
$SEGMENT_FILE_NAME
$MEDIA_PLAYLIST_PREFIX

marcokittel · April 29, 2020, 3:51pm

I rebuild ffmpeg and took the build information out of the official document Using_FFmpeg_with_NVIDIA_GPU_Hardware_Acceleration_v01.4.pdf.

ffmpeg version N-97515-gd813e43b3d Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 8 (Ubuntu 8.4.0-1ubuntu1~19.10)
configuration: --enable-nonfree --enable-cuda-nvcc --enable-libnpp --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64

But it does not make any difference. The parallel transcoding fails. Only one processs stays working. The other processes stop. I actually don’t know how to get this to work properly. It’s very disappointing right now. Still hope to find some solution to fix this behavior.

generix · April 30, 2020, 9:36am

Did you check if vmem fills up while transcoding?

marcokittel · April 30, 2020, 12:15pm

I checked it right now. The systems owns 64 GB Ram and there is a little Swap Disk of 1 GB. But it’s not beeing used.

generix · April 30, 2020, 12:51pm

I meant video memory, not system memory. Use nvidia-smi to check usage.

marcokittel · April 30, 2020, 1:03pm

ah i thought virtual ram, because you can see in the youtube video link, that i opened nvidia-smi at the beginning. And yes the video ram will be allocated and if the transcoding processes freezes - it’s still used.

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 13736 C /home/hls/ffmpeg/ffmpeg 196MiB |
| 0 14476 C /home/hls/ffmpeg/ffmpeg 196MiB |
| 0 14523 C /home/hls/ffmpeg/ffmpeg 196MiB |
±----------------------------------------------------------------------------+

Process 13736 is not transcoding anymore, but the ram is still in use. I don’t know what’s internally happening, because i can’t look with gdb into lib cuda.

generix · May 2, 2020, 9:29pm

I rather suspected an out-of-vmem condition, people had problems with that before. Isn’t the case, though. Don’t know if the gcc version >8 has any influence on it, did you ask at ffmpeg’s?

marcokittel · May 4, 2020, 8:25am

I rebuild ffmpeg with gcc 8. I will try another gcc version later that day. I posted my issue to the ffmpeg user list right now. How can i verify or exclude that it is an out of vmem problem?

(I will search the forum for out of vmem, maybe i find something interessting)

generix · May 4, 2020, 8:36am

It is not a problem with video memory, nvidia-smi tells 5059MiB free.

marcokittel · May 4, 2020, 10:28am

Generix, sorry was not concentrated when i was reading your post so i did not get the information, that you excluded the vmem. I need to improve my English reading skills in all technical matters. But i rebuild ffmpeg with gcc 7.5.0 with the same results.

Configured with: …/src/configure -v --with-pkgversion=‘Ubuntu 7.5.0-3ubuntu1~19.10’ --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~19.10)

generix · May 5, 2020, 11:58am

Maybe some general advice: since you’re running headless, please make sure nvidia-persistenced is started on boot and is continuously running.
Since you’re using multiple ffmped processes, try using MPS.

marcokittel · May 6, 2020, 9:07am

Thanks, i’m reading https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf right now.

It seams that the pascal architecture of the p2200 is not supported by MPS in the video rendering context.

The NVIDIA Codec SDK: NVIDIA VIDEO CODEC SDK | NVIDIA Developer is
not supported under MPS on pre-Volta MPS clients.

If i try to run an ffmpeg instance with mps server running on the p2200 in exclusive mode i get:

[Parsed_scale_cuda_1 @ 0x55ff024f2cc0] auto-inserting filter ‘auto_scaler_0’ between the filter ‘Parsed_split_0’ and the filter ‘Parsed_scale_cuda_1’
Impossible to convert between the formats supported by the filter ‘Parsed_split_0’ and the filter ‘auto_scaler_0’
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:3
[aac @ 0x55ff00c6ed40] Qavg: 205.839
[aac @ 0x55ff00c6ed40] 2 frames left in the queue on closing
[aac @ 0x55ff00c44680] Qavg: 207.565
[aac @ 0x55ff00c44680] 2 frames left in the queue on closing
[AVIOContext @ 0x55ff00be41c0] Statistics: 5846988 bytes read, 0 seeks
Conversion failed!

So mps is no solution at all but it was a very interessting read and at least a try. So what’s the reason for my issue? Is it probably a failure in the vmem allocation implementation of ffmpeg ?

marcokittel · May 19, 2020, 8:09am

Hello Nvidia, any hints?

generix · May 20, 2020, 10:55am

Don’t know nsight compute can help with nvenc debugging, did you look into it?

Topic		Replies	Views
NVIDIA FFmpeg Transcoding Guide Technical Blog	24	4977	June 21, 2022
continuously using h264 cuvid with h264_nvenc makes the encoding process hang GPU-Accelerated Libraries	2	4374	May 11, 2017
FFmpeg cannot init CUDA for transcoding Linux	5	5170	October 12, 2021
ffmpeg CUDA fails CUDA Setup and Installation	9	12156	November 13, 2019
Linux ffmpeg and NVEnc: Mosaic Stream Causes Timeout in libnvidia-ptxjitcompiler Video Processing & Optical Flow cuda	0	468	October 4, 2023
Envec h264 encoding does not work with GTX 1660Ti/1660 Super with Ubuntu Linux 22.04 Linux	37	5798	January 1, 2023
Encoding multiple video limited to 2 encodes CUDA Programming and Performance	8	7673	December 19, 2016
Help with ffmpeg hardware accelaration on Linux General Topics and Other SDKs cuda , linux , ffmpeg , driver	0	1449	April 8, 2022
Cuda failure in Deepstream docker on Centos 7 DeepStream SDK	11	1225	October 12, 2021
DeepStream samples fail in fresh docker-container on centos 7.9 host system: Device is in streaming mode DeepStream SDK	15	541	October 27, 2022

Multiple FFMpeg-Cuda-HLS-Transcoding Instances -> Deadlock Behavior

Related topics