Flushing CUDA decoder for live H264 stream decode

I am using CUDA decoder to decode H264 packets streamed via TCP. The decoder itself works fine; but there seems to be built in latency. If N packets are received; then only M frames are created (where M < N). When a new packets comes in then M+1 frame is sent. The question is “Is there a way to flush decoder”.

There is discussion at Best Codecs and Video Processing Software, Promo Codes & Deals that talks about the same issue.

  1. You can flush the decoder by simply setting the EndOfStream flag on a dummy empty packet (set flag to CUVID_PKT_ENDOFSTREAM).
  2. For decoding immediately after seeking, one simple way is to deliver a dummy EndOfStream packet (to flush the decoder and reset the internal state).

Note that resetting the state also means that the decoder will not start decoding again until it gets a valid SPS&PPS NALUs, so if you want to seek on non-IDR frames that are not preceeded by a SPS/PPS, you may want to send a dummy SPS/PPS NALUs to the decoder (This would also apply to streams that do not contain the SPS/PPS as part of the elementary video stream, ie: MP4).

Has anybody been able to flush decoder to generate decoded picture immediately and keep decoder alive? Simply sending ENDOFSTREAM stops decoder. In my case, the client image and server image needs to remain in-sync. The server is generating H264 packets using NVIDIA GRID SDK.

my knowledge of h264 specifics is limited; hence, at the risk of coming across as ignorant:

“…but there seems to be built in latency.”

a latency, or dependency?
from what you are describing, it seems to be a dependency, rather than a latency

the decoder needs x + y input, to be able to create x output

if so, one can perhaps repeat input as packets and merely flag packets as used/ not used, or relevant/ not relevant; the decoder receives the necessary packets to complete the output, and the redundant packets/ input used to facilitate the process are discarded

Regarding “a latency or dependency” comment above; it is latency. I understand that decoder needs more than 1 H264 packet to generate a display frame. The reason it is latency issue is that if you manually flush decoder using CUVID_PKT_ENDOFSTREAM packet, then the decoder flushes out a few display frames; indicating that there is data pending decoding.

In my case, I need to make sure that the client and server image are in-sync.

again, my h264 knowledge is limited

my feeling is that you may not find a solution off-the-shelf, but would rather need to write/ add custom code to accomplish what i think you wish to accomplish

if you could control the number of packets the client is to pass on to the server, you should be able to better control the synchronization, i would think

the logic would then be:
if the client has not yet sent the required number of packets for the given time-frame or time-slot, flush the decoder
count the number of packets sent via flushing and within the current time-slot, and add the balance to the next cycle
the server would then be able to expect a more or less steady or timely arrival of packets, and can nicely space the packets, keeping excess packets for the next cycle

perhaps

[i hope my directions are right]