I need to deinterlace and process lots of live video streams using ffmpeg (with cuda/nvenc/nvdec enabled) and am at the stage where I need to select a GPU specifically for the deinterlacing part. Does someone know which GPU specification is important to look at for this task? The reasoning is that if I can buy a RTX3050 instead of an RTX3090 it would be nice if they perform exactly the same at that task…
Hello again @Minutemqnqvs, it has been a while.
- “Lots of video streams” sounds like you need high bandwidth and throughput.
- “Deinterlacing” means recombining buffers after decoding with NVDEC, which implies the use of CUDA within the Video Codec SDK.
Both of these depend on memory size, bandwidth, cache size as well as CUDA core count, which are greatly different between different GPU designs. RTX 3090 for example has four times the core count and bandwidth of an RTX 3050.
Regarding the raw decoding power of the dedicated NVDEC chip you can find a rough performance comparison between different chip generations in the `NVDEC_Application_Note.pdf, which is part of the Video SDK download. It does not have data on individual chips, but still might be helpful.
And I hope you are aware of the NVENC/NVDEC GPU support matrix that lists which GPU supports which video codec for de- and ecoding.
Beyond this it is really difficult to give a recommendation. It depends a lot on the specific use-case, wether it is off-line processing and run-time does not matter. Or if the requirement is real-time streaming of lots of parallel streams, which would put you probably already in the professional GPU segment with more than one NVDEC chip.
Thanks for your reply, I will try to see if using nvidia-smi I can measure the load for this job. If it indeed relates to the number of CUDA cores it’s good news as this makes it a simple variable to shop for :) And yes I’m aware of the matrix, a very useful resource.
I have a couple if Tesla T4 on hand to test this workflow. The actual video encoding of the live streams is done by another hardware than Nvidia, some dedicated ASICs. But these ASICs don’t do deinterlacing.
Option 1) is to do it on the CPU in software but with a 32-core AMD EPYC 7xx4 you can saturate it quite quickly
Option 2) is using a GPU, that’s what I’m currently investigating
So if I have a look at NVDEC Application Note - NVIDIA Docs it would mean that on a RTX 4090 in H264 I could decode in real time 883/50 = around 18 streams at 1080p 50 fps right?
And a Tesla T4 has 2 NVDEC engines so something around the lines of 719 * 2 / 50 = 28 streams at 1080p 50 fps very roughly.
Yes, that math is correct.
But the necessary disclaimer: In your specific setup the numbers might vary.