Possible driver bug or sdk bug related to encoding interlaced video with nvenc

Current system: Ubuntu 18.04.1 LTS
NVIDIA driver: 410.73
NVIDIA graphics card: Quadro P5000

I was trying to encode interlaced video captured from Blackmagic Design DeckLink device with nvenc(h264) into approximately 2 seconds long segments using ffmpeg. However, ffmpeg failed to find a point to segment the video and the second segment was never created. I would like to share my problem and get some help or answers.

Long story short, it seems like when ffmpeg’s ildct flag is set, information of the encoded packet retrieved from nvenc is never marked as key frame. However, when the resulting file(undesirably long first segment) is analyzed with ffprobe there appears to be IDR frames(key frames). I found two ffmpeg tickets related to this issue and thought there might be a problem with the driver or the SDK. Links to the tickets are the following.

https://trac.ffmpeg.org/ticket/5440
https://trac.ffmpeg.org/ticket/7080

Now is the more elaborate version of my experience with this issue. The ffmpeg command I used is along the following line.

ffmpeg -f [DeckLink device specific options] -i [decklink device name] -c:v h264_nvenc -b:v 5000k -r 30000/1001 -g 15 -pix_fmt yuv420p -flags ilme+ildct+cgop -f segment -segment_time 2 -segment_format mpegts -segment_list list.m3u8 segment_%03d.ts

Not certainly sure about how ffmpeg works but

  1. ffmpeg requests h264_nvenc to encode something
  2. receives the result through h264_nvenc(nvenc.c)
  3. sends result to the segment muxer(segment.c)
  4. segment muxer requests mpegts muxer(mpegtsenc.c) to finally write the packet

Whether the file must be segmented or not is decided by the segment muxer. One of the conditions that must be met for the file to be segmented is for the packet being written is an IDR frame(key frame). Since, all the packet information was never marked as a key frame the file was never segmented.

Contrary to the information while encoding, the resulting undesirably long first segment contains IDR frames for sure as I analyzed it with ffprobe and is segmented as desired when used as input. I’m assuming that the encoded packet by h264_nvenc encoder is correctly encoded but the information provided from nvenc via getting the bitstream state by locking it is not working properly.

Right now I got around this problem by adding some code to determine if the packet contains nal unit indicating it as an IDR frame and manually set the key frame flag in the segment muxer.

I would like to know…

  1. if it is a driver or sdk issue
  2. if this problem can be reproduced on Windows with the latest drivers
  3. other possibilities than issues with the driver or sdk

Hi,

the issue happens due to how nvidia encodes the interlaced video. As oromit posted in the second ticket you mention (ticket which I opened), “nvenc does not support encoding interlaced fields as one interleaved frame.
Instead, it emits two independent fields per frame. (no support for MBAFF)
Due to each field needing its own timestamp, but being in the same packet with only one shared timestamp, FFmpeg does not properly support handling those coming out of an encoder, and how well it works is pretty much up to luck depending on the container and transport method in use.
I looked into it a while ago, but concluded that fixing that would be a major non-trivial change to pretty much the entire FFmpeg codebase, which I’m not going to be able to make on my own. I’d even say it’s unlikely that this will ever be fixed, unless nvidia implements mbaff support.”
and
“Because the two fields have different parameters. Like, one of them can be a keyframe, but the other isn’t, and whichever one comes first gets its parameter set on the packet, but the ones for the other field are lost. That includes its timestamp, and flags like being keyframe.
If it works depends on how much the output format depends on the information in the packet. Some don’t use it at all, others rely heavily on it.
The two fields are combined in one packet, which is exactly the problem. If I split them, I end up with the rest of ffmpeg exploding because an encoder is not expected to return two packets for one input frame, even though the new API technically supports it.”

And we have seen, new RTX series drops field encoding completely, so I am not very optimistic about how this goes. It is a very bad decision not properly supporting field encoding, it is very much still used everywhere where broadcast tv streams have to be transcoded.

Hello malakudi,

Thank you for taking your time to repeat what was mentioned in the ticket with more detail. Having everything written as a whole on a single post instead of thread of comments made it a lot easier to process in my head. I would be grateful if you could confirm my understanding and answer some additional questions.

This is what I understood.

  1. From what I learned by doing some more research after reading your post, interlaced video can use frame coding or field coding. Frame coding is coding two fields as a frame whereas field coding is coding fields separately. And nvenc only supports field coding.

  2. Even for field coding, the independently coded two fields are contained in one single frame packet sharing one timestamp.

This is my question.
According your post, I think you are trying to say that interlaced encoding of nvenc is correct, at least for how it uses field coding and something more has to be done from ffmpeg?

You seem to know a lot more about ffmpeg and encoding so I don’t feel like arguing with you but I can’t help questioning why nvenc is not marking packets’ pict type as NV_ENC_PIC_TYPE_IDR when the frame is encoded as an IDR frame. And like I said, IDR frames are observed in the encoded bitstream if nal units are analyzed.

Interlaced video can be MBAFF or PAFF. FFmpeg supports/prefers MBAFF, nvidia supports/prefers PAFF.
Quoting from wikipedia article about H.264:
“Flexible interlaced-scan video coding features, including:
Macroblock-adaptive frame-field (MBAFF) coding, using a macroblock pair structure for pictures coded as frames, allowing 16×16 macroblocks in field mode (compared with MPEG-2, where field mode processing in a picture that is coded as a frame results in the processing of 16×8 half-macroblocks).
Picture-adaptive frame-field coding (PAFF or PicAFF) allowing a freely selected mixture of pictures coded either as complete frames where both fields are combined together for encoding or as individual single fields.”

The issue is actually ffmpeg’s not good support for PAFF, IF i have understood the issue correctly. You can reproduce the exact same issue with sample file encoded in PAFF from ffmpeg samples, here: https://samples.ffmpeg.org/V-codecs/h264/PAFF/Sat1PAFF.ts . Your test command above will fail to create segments on keyframe with this PAFF input.

nvidia supporting MBAFF would fix the issue, but since they disabled completely field encoding on their turing hardware, I don’t see any progress in field encoding happening in the future.

A way to “workaround” this bug is the following (with your example above):
ffmpeg -f [DeckLink device specific options] -i [decklink device name] -c:v h264_nvenc -b:v 5000k -r 30000/1001 -g 15 -pix_fmt yuv420p -flags ilme+ildct+cgop -f mpegts ‘udp:238.10.1.1:1234?pkt_size=1316’
ffmpeg -i udp:238.10.1.1:1234 -c:v copy -f segment -segment_time 2 -segment_format mpegts -segment_list list.m3u8 segment_%03d.ts

Output to mpegts udp multicast, then second instance does the segmenting with -c:v copy

This works because the mpeg-ts muxer takes info about keyframe from different place than the hls muxer. I tried to find in the source code where is the difference, but I couldn’t - my understanding for the code is not enough.

Thank you for suggesting a workaround but I already know about that solution as I have tried it with files. Anyway, I’m glad you brought that up.

Suppose I transcoded video file A and got video file B.
Tell me if I’m wrong but I expect encoding output of A be the same as demuxed output of B.
This assumption is behind the rationale that the muxer should have no knowledge about where the packet that it has to mux originated from. And that is why I think we come up with a workaround like your suggestion.

Considering this and assuming the muxer is working properly I think there is a problem with the encoder.

I appreciate you trying to explain it in your own terms but it is not convincing to me as I deep down more into the problem. If nvidia not supporting MBAFF is the actual problem I think our workaround solution should not also work.

Workaround kind of works (if you try to do a vcodec copy you will get a warning “Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly” which indicates that output is not 100% compliant) because different muxers handle packets and timestamps differently. As I already said, this is just my understanding of what oromit has said, and since he is the maintainer of the nvenc code in ffmpeg, he must know better than anyone.