We have noticed that encoding with FFmpeg and nvenc in realtime the CPU utilization more than triples from CUDA version 11.4.2 to 11.6.2. At the same time, using CUDA 11.4.2 achieves slightly more frames per second.
We also observed that FFmpeg uses 15 threads on CUDA 11.4.2 and 102 on CUDA 11.6.2, which may be related to this issue.
We used a python script to start and monitor the FFmpeg process that we can provide
We used the official FFmpeg 5.0 binary with nvenc and h264 from GyanD (link removed)
Video URL: https://media.xiph.org/video/derf/y4m/touchdown_pass_1080p.y4m
CUDA | CPU Load (mean) [%] | CPU Load STD [%] | CPU Load MIN [%] | CPU Load MAX [%] |
---|---|---|---|---|
11.6.2 | 2.02 | 0.47 | 1.32 | 2.69 |
11.4.2 | 0.66 | 0.15 | 0.46 | 0.93 |
FFmpeg command:ffmpeg -y -re -stream_loop 2 -i touchdown_pass_1080p.y4m -c:v h264_nvenc -b:v 10M touchdown_pass_1080p.mp4
Used system:
CPU: AMD Epyc 7352 (24 cores, 48 threads)
GPU: Nvidia Quadro RTX 4000
Memory: 64GB @ 3200 MHz (enough to cache all videos)
Mainboard: Gigabyte MZ32-AR0
OS: Windows 10 Pro 21H2 64-bit
CUDA Toolkit versions:
cuda_11.4.2_471.41_win10
cuda_11.6.2_511.65_windows
Does anyone know whether this is a regression/bug or how to work around this?