Low Latency Decoding Issue

Hi,

I’m using latest NVIDIA Video Codec SDK 8.2.15 to decode H264 RAW file, but found something weird.

The decoder was initialized as this, nothing else, try to decode full HD at low latency mode.

m_nv_decoder.reset(new NvDecoder(m_cu_ctx,
            1920,
            1080,
            false,
            cudaVideoCodec_H264,
            NULL,
            true, // Enable Low Latency
            true));

Since I know exact size of each NAL packet, so I could see how many frames the nvdecoder generates at each frame. First column is the size of packet which is filled to the decoder.

Quadro K4000
24      USED 0.016717 ms GOT 0 FRAMES
512486  Session Initialization Time: 36 ms
USED 38.4174 ms GOT 0 FRAMES

The first packet has 24 bytes, that’s the header of the NAL stream. After it that’s IDR frame, which is big, almost 512k, and NvDecoder spent 36ms to initialize the session, seems okay.

Then I got strange output

72667   USED 0.164528 ms GOT 0 FRAMES
39816   USED 0.112324 ms GOT 0 FRAMES
27041   USED 0.097954 ms GOT 0 FRAMES
86387   USED 7.14801 ms GOT 0 FRAMES
30537   USED 4.89243 ms GOT 0 FRAMES
17412   USED 4.07976 ms GOT 0 FRAMES
16959   USED 3.63545 ms GOT 0 FRAMES
16283   USED 5.54467 ms GOT 0 FRAMES
16169   USED 4.56718 ms GOT 0 FRAMES
15433   USED 4.1563 ms GOT 0 FRAMES
30526   USED 4.56542 ms GOT 0 FRAMES
51047   USED 4.39122 ms GOT 0 FRAMES
61002   USED 6.50779 ms GOT 1 FRAMES
47014   USED 3.17647 ms GOT 1 FRAMES
45105   USED 4.45603 ms GOT 1 FRAMES
31550   USED 5.04728 ms GOT 1 FRAMES
30662   USED 5.35434 ms GOT 1 FRAMES
30436   USED 3.4586 ms GOT 1 FRAMES
30131   USED 4.3264 ms GOT 1 FRAMES
43660   USED 4.71265 ms GOT 1 FRAMES
44534   USED 4.34987 ms GOT 1 FRAMES
44803   USED 3.1841 ms GOT 1 FRAMES
45012   USED 4.39855 ms GOT 1 FRAMES
31626   USED 4.36482 ms GOT 1 FRAMES

The decoder accepted the frame data, but didn’t return the frame, until the 13th times.

At the end

0       USED 26.3687 ms GOT 13 FRAMES

It returns the 13 frames.

So my question is is that possible to return the frame immediately rather than cached frame ? My application requires to reduce the latency as much as possible, I don’t want to get the 13 frames at once. Is there anything I have to setup for the parameter ?

Thank you very much !!!

Are you using sample code for the decoder? If so, which one? Also, what is the GOP structure of your video file?

Hi

Yes, I’m using the NvDecoder from the sample code directly. My video file is the H264 RAW, and I recorded each packet’s offset in the file, so I could fill the decoder with complete packets.

I’d like to use low level API to try.

Thanks.

Unfortunately, you didn’t answer my question about your video file. Can you please give us the GOP structure of the video file, or provide a link to a sample video file? Thank you.

If you have a typical stream with I, P, and B frames, you cannot expect to decode with zero delay, due to frame re-ordering. If you had a stream without B frames it may be possible. I’m not sure about IPPP… I’d have to try it. But IIIIII… should be fine.

Hi

I dumped the H264 raw file by ffprobe.

[FRAME]
media_type=video
stream_index=0
key_frame=1
pkt_pts=N/A
pkt_pts_time=N/A
pkt_dts=N/A
pkt_dts_time=N/A
best_effort_timestamp=N/A
best_effort_timestamp_time=N/A
pkt_duration=48000
pkt_duration_time=0.040000
pkt_pos=0
pkt_size=512510
width=1920
height=1080
pix_fmt=yuv420p
sample_aspect_ratio=N/A
pict_type=I
coded_picture_number=0
display_picture_number=0
interlaced_frame=0
top_field_first=0
repeat_pict=0
color_range=unknown
color_space=unknown
color_primaries=unknown
color_transfer=unknown
chroma_location=left
[/FRAME]
[FRAME]
media_type=video
stream_index=0
key_frame=0
pkt_pts=N/A
pkt_pts_time=N/A
pkt_dts=N/A
pkt_dts_time=N/A
best_effort_timestamp=N/A
best_effort_timestamp_time=N/A
pkt_duration=48000
pkt_duration_time=0.040000
pkt_pos=512510
pkt_size=72667
width=1920
height=1080
pix_fmt=yuv420p
sample_aspect_ratio=N/A
pict_type=P
coded_picture_number=1
display_picture_number=0
interlaced_frame=0
top_field_first=0
repeat_pict=0
color_range=unknown
color_space=unknown
color_primaries=unknown
color_transfer=unknown
chroma_location=left
[/FRAME]

The number 512510 = 24 + 512486, the first I-frame, the rest frames are all P frames. The encoder disables B-frame, only allows the I and P-frames.

Just want to as fast as possible, at the end the 13 frames jumps out, that is very weird.

The 2-3 frames latency is reasonable and acceptable.

Thanks.

Can you give us a link to a sample video? I don’t have any IPPPP… streams at hand and gearing up to make one would take a lot of time. If you provide a sample I can run some experiments.

http://insaneboard.de/download/?dir=Ambilight/Testvideos

The file RGB Test Sequence - Youtube.mp4 is the video file which contains I and P frame only, I verified with ffprobe.

Thank you very much.

Thank you. I will run some experiments and report back.

I built the sample AppDec, which uses the NvDecoder code. But it also uses ffmpeg for demuxing, which sends bunchs of NALUs at a time. You are talking about sending NALUs one-by-one. So you must have created your own application. Can you share that with us? Without it we can’t reproduce your results.

As an aside I tested the file with my decoder, which is not set up for low latency and which sends data NALU-by-NALU. Here is what I see in order:

SPS
PPS
IDR slice
non-IDR slice
HandleVideoSequence()
non-IDR slice
HandlePictureDecode()
non-IDR slice
HandlePictureDecode()
non-IDR slice
HandlePictureDecode()
HandlePictureDecode()
HandlePictureDisplay()

So without doing anything to try to get low latency, I have to push 5 frames to get the first picture display callback.

I ran the AppDecLowLatency project and it appears to be giving one frame out per one frame in for your stream. So we really do need to see your application to help you.

Hi

Still many thanks, I already started to write my own application to decode the video.

Thank you very much.

Solved