Possible reasons why cuvidDecodePicture may block for 20 seconds when decoding HEVC?

Our situation is the following:

We transcode 2 HEVC (4K) video streams using ffmpeg and nvidia.

Sometimes (a few times a day or in a couple of days) we observe an error - “Circular buffer overrun. To avoid, increase fifo_size URL option. To survive in such case, use overrun_nonfatal option”.

Right before the error we observe that the speed of transcoding drops drastically:

Apr 24 07:50:14 frame=4537333 fps= 50 q=-1.0 q=34.0 size=N/A time=25:12:28.06 bitrate=N/A dup=1 drop=0
Apr 24 07:50:15 frame=4537359 fps= 50 q=-1.0 q=34.0 size=N/A time=25:12:28.58 bitrate=N/A dup=1 drop=0
Apr 24 07:50:36 frame=4537369 fps= 50 q=-1.0 q=34.0 size=N/A time=25:12:28.78 bitrate=N/A dup=1 drop=0
Apr 24 07:50:36 frame=4537369 fps= 50 q=-1.0 q=34.0 size=N/A time=25:12:28.78 bitrate=N/A dup=1 drop=0
Apr 24 07:50:57 frame=4537371 fps= 50 q=-1.0 q=34.0 size=N/A time=25:12:28.82 bitrate=N/A dup=1 drop=0
Apr 24 07:51:07 frame=4537372 fps= 50 q=-1.0 q=34.0 size=N/A time=25:12:28.84 bitrate=N/A dup=1 drop=0

Pay attention to the second and third lines - it took 21 second (07:50:36 - 07:50:15 = 21 second) to transcode 10 frames (4537369 - 4537359).
If you look at the 5 and 6 lines, you will see that it took 10 seconds to transcode just 1 frame.

And it happened all of a sudden - as you can see in the logs, before that there had been 25 hours of successful transcoding and no errors/warnings.

I tracked down the culprit - the process is blocked by function cuvidDecodePicture (https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/cuviddec.c#L339). But I am unable to investigate any further, because I don’t have the sources of this function. Is there by any chance somebody who was faced with the same issue?

More details:
Graphic card - GTX 1080/GTX 1050 (the error manifests itself no matter which card we’re using)
Driver version - 418.56 (we tried drivers 3xx.xx, didn’t help)
Docker as runtime environment (we tried nvidia images with versions 9.2-devel-ubuntu16.04 and 10.1-devel-ubuntu18.04 (https://hub.docker.com/r/nvidia/cuda/))
As input we’re using multicast mpegts HEVC video. Here’s an example of ffmpeg command:

ffmpeg -y -xerror -scan_all_pmts 0
    -hwaccel cuvid -c:v hevc_cuvid
    -copyts -start_at_zero
    -i "udp://@225.0.0.1:1234?fifo_size=688128"
    -c:v hevc_nvenc
    -map 0:0 -map 0:0 -map 0:1
    -rc vbr
    -c:v:0 copy
    -qmin:v:1 21
    -qmax:v:1 35
    -b:v:1 8000000
    -maxrate:v:1 8800000
    -bufsize:v:1 4000000
    -filter:v:1 scale_cuda=w=1920:h=1080
    -g 250 -r 50
    -c:a:0 copy
    -f mpegts /dev/null

What’s interesting - is that the problem can’t be reproduced with the same video sample (after I reproduced the problem with multicast stream, I successfully transcoded the same sample (which I’d kept as an mpegts file on file system)).

What’s even more interesting, today we reproduced the same error at the same time on two servers, which were transcoding the same streams in parallel.

What could cause the problem?

Seems it got fixed in the latest stable driver (430.14).There were no hangs in the last several days since I started transcoding with the new driver.