NVDEC in FFMPEG (cuvid) drops frames when using deinterlacing (deint 2)

Out of curiosity I just ran some tests to get an idea of the difference in quality, if any, between yadif, yadif_cuda, cuvid deint and mcdeint.

The test was based on the Big Buck Bunny movie and the procedure should be self-evident from the commands below, but roughly:

  • transcode the original to NV12 lossless, use this as the base for comparison
  • create an interlaced version
  • deinterlace with yadif, compare with base
  • deinterlace with yadif_cuda, compare with base
  • deinterlace with cuvid, compare with base
  • deinterlace with mcdeint, compare with base

The results appear to suggest cuvid isn’t actually very good, at least for this example anyway, which I’m a bit surprised at because I thought it was meant to be doing something more like mcdeint. Perhaps there is something wrong with my test? I get the same results with ffmpeg 3.3.4 as I do with 4.1.

SSIM results

yadif 0.993549
yadif_cuda 0.993395
mcdeint 0.997456
cuvid 0.948984

ffmpeg -i bbb_sunflower_1080p_60fps_normal.mp4 -vcodec libx264 -pix_fmt nv12 -preset ultrafast -crf 0 -g 1 -acodec copy -f mp4 original.mp4 -y
ffmpeg -i original.mp4 -vcodec libx264 -pix_fmt nv12 -preset ultrafast -crf 0 -g 1 -vf tinterlace=interleave_top,fieldorder=tff -flags ilme+ildct -f mp4 interlaced.mp4 -y
ffmpeg -i interlaced.mp4 -vcodec libx264 -pix_fmt nv12 -preset ultrafast -crf 0 -g 1 -acodec copy -vf yadif=1 -r 60 -f mp4 deinterlaced_yadif.mp4 -y
ffmpeg -init_hw_device cuda=0 -i interlaced.mp4 -vcodec h264_nvenc -pix_fmt cuda -preset lossless -filter_hw_device 0 -vf hwupload,yadif_cuda=1 -acodec copy -r 60 -f mp4 deinterlaced_yadif_cuda.mp4
ffmpeg -i interlaced.mp4 -vcodec libx264 -pix_fmt nv12 -preset ultrafast -crf 0 -g 1 -acodec copy -vf yadif=1,mcdeint=fast:tff:1 -r 60 -f mp4 deinterlaced_mcdeint.mp4 -y
ffmpeg -vcodec h264_cuvid -deint adaptive -i interlaced.mp4 -vcodec libx264 -pix_fmt nv12 -preset ultrafast -crf 0 -g 1 -acodec copy -r 60 -f mp4 deinterlaced_cuvid.mp4 -y
ffmpeg -i deinterlaced_yadif.mp4 -i original.mp4 -lavfi ssim -f null -
ffmpeg -i deinterlaced_yadif_cuda.mp4 -i original.mp4 -lavfi ssim -f null -
ffmpeg -i deinterlaced_mcdeint.mp4 -i original.mp4 -lavfi ssim -f null -
ffmpeg -i deinterlaced_cuvid.mp4 -i original.mp4 -lavfi ssim -f null -
pause

EDIT: updated to include yadif_cuda

@oviano: Got similar results, although I used -preset faster -crf 8 -g 60 because I didn’t have many GBs of free space.
For the cuvid example, I had to use -r 60 on input with ffmpeg 4.1, or I had dropped frames (not many but I had some). Haven’t tried with ffmpeg 3.3.x
For the yadif example, -r 60 on output is not needed.

cuvid deint adaptive
SSIM Y:0.927847 (11.417453) U:0.985201 (18.297618) V:0.987745 (19.116768) All:0.947389 (12.789227)
yadif
SSIM Y:0.988896 (19.545393) U:0.994195 (22.362096) V:0.994909 (22.931717) All:0.990782 (20.353473)
yadif with cuvid decoder
SSIM Y:0.988899 (19.546533) U:0.994197 (22.363237) V:0.994910 (22.932858) All:0.990784 (20.354613)

Have you tried yadif_cuda?

Yes, I’ve updated my post to include yadif_cuda…basically the same as yadif, probably rounding errors or something is what accounts for the tiny difference.

I also tried yadif with the cuvid decoder and got similar results to you. I thought maybe the cuvid decoder might be doing something bad, but apparently not.

Did you try a different source file to me?

Tried same source. Want to do a test with interlace filter instead of tinterlace, which I use on production to convert some 720p50 live streams to 576i50 with:
-vf ‘scale=544:576:flags=bicubic,interlace=lowpass=0:scan=tff’

Tried a different file, a recording at 720p50 from Beijing Olympics opening ceremony (a five minute sample of it actually) and got completely different results. yadif is at 0.69 while cuvid+deint adaptive is at 0.9

If you want to try it with lossless settings (I can’t since I don’t have that much disk space), it is here: Dropbox - File Deleted

Thanks, I’ll give that a go. Are you upscaling to 1080p50 first, or just interlacing to 720i50?

Just interlacing to 720i50, then deinterlace back to 720p50.

So with lossless I get a similar pattern to Big Buck Bunny:

yadif 0.980813
yadif_cuda 0.979554
mcdeint 0.987913
cuvid deint adaptive 0.895912

My commands, for the record:

ffmpeg -i olympics.ts -vcodec libx264 -pix_fmt nv12 -preset ultrafast -crf 0 -g 1 -acodec copy -f mp4 original.mp4 -y
ffmpeg -i original.mp4 -vcodec libx264 -pix_fmt nv12 -preset ultrafast -crf 0 -g 1 -vf tinterlace=interleave_top,fieldorder=tff -flags ilme+ildct -f mp4 interlaced.mp4 -y
ffmpeg -i interlaced.mp4 -vcodec libx264 -pix_fmt nv12 -preset ultrafast -crf 0 -g 1 -acodec copy -vf yadif=1 -r 50 -f mp4 deinterlaced_yadif.mp4 -y
ffmpeg -init_hw_device cuda=0 -i interlaced.mp4 -vcodec h264_nvenc -pix_fmt cuda -preset lossless -filter_hw_device 0 -vf hwupload,yadif_cuda=1 -acodec copy -r 50 -f mp4 deinterlaced_yadif_cuda.mp4
ffmpeg -i interlaced.mp4 -vcodec libx264 -pix_fmt nv12 -preset ultrafast -crf 0 -g 1 -acodec copy -vf yadif=1,mcdeint=fast:tff:1 -r 50 -f mp4 deinterlaced_mcdeint.mp4 -y
ffmpeg -vcodec h264_cuvid -deint adaptive -i interlaced.mp4 -vcodec libx264 -pix_fmt nv12 -preset ultrafast -crf 0 -g 1 -acodec copy -r 50 -f mp4 deinterlaced_cuvid.mp4 -y
ffmpeg -i deinterlaced_yadif.mp4 -i original.mp4 -lavfi ssim -f null -
ffmpeg -i deinterlaced_yadif_cuda.mp4 -i original.mp4 -lavfi ssim -f null -
ffmpeg -i deinterlaced_mcdeint.mp4 -i original.mp4 -lavfi ssim -f null -
ffmpeg -i deinterlaced_cuvid.mp4 -i original.mp4 -lavfi ssim -f null -
pause

I’m glad you guys are finding the yadif_cuda filter useful; guess I did that at the right time. :-)

oviano: Note that your tests are not fully representative. You’re using nvenc as your encoder in the yadif_cuda example but using libx264 in the the other examples. That will likely explain your SSIM differences - at least more of it than rounding differences (I hope there aren’t any but can’t promise).

A completely fair comparison would use the same encoder for all examples and use the nvdec hwaccel when not using cuvid (same hardware decoding in both cases)

eg:

ffmpeg -hwaccel nvdec -hwaccel_output_format cuda -i interlaced.mp4 -vf yadif_cuda=mode=1 -c:v h264_nvenc -preset lossless -f mp4 deinterlaced_yadif_cuda.mp4
ffmpeg -hwaccel nvdec -hwaccel_output_format cuda -i interlaced.mp4 -vf hwdownload,format=nv12,yadif=mode=1,hwupload_cuda -c:v h264_nvenc -preset lossless -f mp4 deinterlaced_yadif.mp4
ffmpeg -hwaccel cuda -c:v h264_cuvid -deint adaptive -i interlaced.mp4 -r 50 -c:v h264_nvenc -preset lossless -f mp4 deinterlaced_cuvid.mp4

Unfortunately, I can’t comment on the dropped frames using the cuvid deinterlacing.

Yes, the filter is very useful for my hobby project (emustream.tv) - nice work! (should add that I don’t distribute an ffmpeg with cuda-sdk for obvious reasons).

Point taken re: my tests. I was assuming that nvenc preset lossless and libx264 -crf 0 would produce an equivalent result given the same pixel format, I guess I’m overlooking something with that then?

I will re-run my tests with your suggestion, just out of curiosity.

PS how about a mcdeint_cuda… :)

You should test nvenc vs libx264 lossless with everything else identical and see what happens. Get a baseline. Maybe I’m wrong and the difference is all in the filter.

Maybe I’ll do mcdeint. I’ve been working on a cuda version of bwdif (which you should try - I’m curious) but I’m still seeing visible differences in output on some samples. Can’t make sense of it at the moment.

As for ‘ffmpeg’. The command line tool can’t really cope with cuvid deinterlacing. Deinterlacing in the decoder means the decoder is outputting 2 frames for every input packet. The ffmpeg tool basically can’t handle this correctly - it initialises other parts of the pipeline using the demuxed frame rate and those don’t change when the decoder starts spitting out frames faster. In very simple cases (probably with no filters), it might work correctly with a forced framerate, but it’s inherently fragile.

But to be clear, this is specific to the ffmpeg tool. The actual libraries handle this all just fine and someone can write a tool with the libraries to do this (as a media player, mpv handles it just fine, for example). However, the ffmpeg tool is very set in its ways.

So here are the results with nvenc_h264 as the encoder in all cases, suggesting that libx264 and nvenc_h264 lossless are indeed equivalent.

This is for the olympics sample (720p50 → 720i50 → 720p50).

yadif 0.980813
yadif_cuda 0.979554
mcdeint 0.987913
bwdif 0.986668
w3fdif 0.983495
cuvid 0.895912

bwdif looks good, and fast too even without a cuda version.

Commands below for reference.

ffmpeg -i olympics.ts -vcodec libx264 -pix_fmt nv12 -preset ultrafast -crf 0 -g 1 -acodec copy -f mp4 original.mp4 -y
ffmpeg -i original.mp4 -vcodec libx264 -pix_fmt nv12 -preset ultrafast -crf 0 -g 1 -vf tinterlace=interleave_top,fieldorder=tff -flags ilme+ildct -f mp4 interlaced.mp4 -y
ffmpeg -hwaccel nvdec -hwaccel_output_format cuda -i interlaced.mp4 -vf hwdownload,format=nv12,yadif=mode=1,hwupload_cuda -c:v h264_nvenc -preset lossless -f mp4 deinterlaced_yadif.mp4 -y
ffmpeg -hwaccel nvdec -hwaccel_output_format cuda -i interlaced.mp4 -vf yadif_cuda=mode=1 -c:v h264_nvenc -preset lossless -f mp4 deinterlaced_yadif_cuda.mp4 -y
ffmpeg -hwaccel nvdec -hwaccel_output_format cuda -i interlaced.mp4 -vf hwdownload,format=nv12,yadif=mode=1,mcdeint=fast:tff:1,hwupload_cuda -c:v h264_nvenc -preset lossless -f mp4 deinterlaced_mcdeint.mp4 -y
ffmpeg -hwaccel nvdec -hwaccel_output_format cuda -i interlaced.mp4 -vf hwdownload,format=nv12,bwdif=mode=1,hwupload_cuda -c:v h264_nvenc -preset lossless -f mp4 deinterlaced_bwdif.mp4 -y
ffmpeg -hwaccel nvdec -hwaccel_output_format cuda -i interlaced.mp4 -vf hwdownload,format=nv12,setfield=tff,w3fdif,hwupload_cuda -c:v h264_nvenc -preset lossless -f mp4 deinterlaced_w3fdif.mp4 -y
ffmpeg -hwaccel cuda -c:v h264_cuvid -deint adaptive -i interlaced.mp4 -r 50 -c:v h264_nvenc -preset lossless -f mp4 deinterlaced_cuvid.mp4 -y
ffmpeg -i deinterlaced_yadif.mp4 -i original.mp4 -lavfi ssim -f null -
ffmpeg -i deinterlaced_yadif_cuda.mp4 -i original.mp4 -lavfi ssim -f null -
ffmpeg -i deinterlaced_mcdeint.mp4 -i original.mp4 -lavfi ssim -f null -
ffmpeg -i deinterlaced_bwdif.mp4 -i original.mp4 -lavfi ssim -f null -
ffmpeg -i deinterlaced_w3fdif.mp4 -i original.mp4 -lavfi ssim -f null -
ffmpeg -i deinterlaced_cuvid.mp4 -i original.mp4 -lavfi ssim -f null -
pause

<<<<<

EDIT: added w3fdif result too

And for completeness, the same test for the Big Buck Bunny source:

yadif 0.993549
yadif_cuda 0.993355
mcdeint 0.997456
bwdif 0.994291
w3fdif 0.990322
cuvid 0.948984

It would be interesting to know if the relatively poor performance of cuvid deinterlacing is a result of the way FFmpeg works or the algorithm itself.

Are you sure that it is poor quality, can you check that it is not just some kind of frame shift so comparsion os not on the same frame?

From our tests adaptive deinterlace was better than yadif

Yes, that’s quite possible, especially given what langedalepl was saying about the FFmpeg integration. Maybe it’s out of sync by a frame or something.

Certainly it doesn’t look visually awful as the SSIM suggests it should.

I’d have to figure out how, but maybe I could drop the first frame of the cuvid decode from the comparison or something and see what results that give.

The thing is, it used to work before a specific commit between 3.3 and 3.4 series (I mention which exactly commit of the ffmpeg ticket). nvidia promotes the support of their APIs in ffmpeg, so in my opinion they should work closer (nvidia and ffmpeg devs) to produce a full working solution. As it is, cuvid deinterlace is not working 100%, it works on some usage scenarios but not all. The main problem is that nvidia doesn’t care much about interlaced content, although it should be a very imprortant factor. They assume all is progressive now, which it isn’t. Look for example the issue on new b-pyramid option that doesn’t work when doing field encoding. On newer turing hardware they dropped field encoding completely, etc etc.

Since you built the yadif_cuda filter, can you also port some more filters to cuda version? delogo is one, overlay is the more important - don’t know if it is possible though.

So the nb_frames as read by:

ffprobe -v error -show_format -show_streams .mp4

Produced the value 38074 for the original and all deinterlaced files…except deinterlaced_cuvid which shows 38100.

So yeah, FFmpeg cuvid deint is broken I guess.

I have working overlay_npp filter, i would like to commit it to FFMPEG in next few weeks, overlay_cuda is more complicated.

Try with -r XX on input instead of output, does it make any difference on nb_frames?

This is great, looking forward to it. Will it be able to also work with ass (subtitle) filter, or this also needs to be converted to npp/cuda version?