Question about optical flow quality

Hi!

I make use of Nvidia DALI to decompress videos and to calculate optical flow between video frames. My video decoding pipeline is very similar to the examples provided by the DALI docs.

I had some questions I asked the DALI devs, but was redirected to this forum (original issue).

I have noticed the quality of the returned optical flow highly differs based on the input video type. I have provided my sample input video I converted to .mov and .webm format and the calculated optical flow output below:
Input video
Flow output .mov file
Flow output .webm file

As you can see, the generated flow output differs highly between the two file formats & both results contain quite a lot of noise. I was wondering what the cause of difference in quality is. Does the generated optical flow quality depend this strongly on the input video type? If so, what would be the most ideal video type to use to generate high-quality optical flow, or how can I improve the quality of the returned optical flow frames?

I hope switching to a different file type is not necessary, as I will be expecting .mov files as input to my pipeline. Many thanks in advance!

Best regards,
Renzo

EDIT: As a follow-up question, would there be a difference in using RGB (or BGR) video frames instead of grayscale video frames in the resulting optical flow frames? For instance, some of OpenCV’s OF implementations require you to convert your input to grayscale, while DALI also allows RGB video frames.

Source code of my current video decoding pipeline:

import numpy as np
import nvidia.dali.fn as fn
from nvidia.dali import pipeline_def
import nvidia.dali.types as types

batch_size = 1

# Nvidia OF presets that determine the speed/quality trade-off of the output optical flow
of_presets = {
    "slow": 0.0,
    "medium": 0.5,
    "fast": 1.0
}


@pipeline_def
def video_pipe(filename: str, sequence_length: int, with_of: bool, goal_shape: (int, int), preset: str = "slow"):
    """
    Pipeline that decodes a single video & simultaneously calculates the optical flow
    :param with_of: Whether the video pipe should also return optical flow
    :param goal_shape: Goal shape of the optical flow frames
    :param filename: Filename of the video file that should be decoded
    :param sequence_length: The number of frames returned per batch of decoding (if video length % sequence_length != 0, it will return duplicate / repeated frames, so make sure to filter these away after decoding)
    :param grid_size: Height & width of the calculated optical flow are divided by grid_size, mostly used for optimization. Currently supports 1 & 4 only.
    :param preset: The speed and quality level of the optical flow calculations
    :return A tuple of the decoded video frames (shape: [sequence_length, width, height, 3]) and optical flow frames (shape: [sequence_length - 1, width / grid_size, height / grid_size, 2])
    """
    video = fn.readers.video(device="gpu", filenames=filename, sequence_length=sequence_length,
                             shard_id=0, num_shards=1, random_shuffle=False, initial_fill=batch_size,
                             skip_vfr_check=True, pad_sequences=True, name="Reader", file_list_include_preceding_frame=True)

    # gray = fn.color_space_conversion(video, image_type=types.DALIImageType.RGB, output_type=types.DALIImageType.GRAY)

    video_bgr = fn.color_space_conversion(video, image_type=types.DALIImageType.RGB, output_type=types.DALIImageType.BGR)

    if with_of:
        of = fn.optical_flow(video, output_format=4, preset=of_presets[preset], enable_temporal_hints=True)
        of = fn.resize(of, resize_x=goal_shape[0], resize_y=goal_shape[1])
        return video_bgr, of
    else:
        return video_bgr


class VideoDecoder:

    def __init__(self, filename: str, with_of: bool, goal_shape: (int, int) = (None, None), sequence_length: int = 10):
        """
        Class that contains functions to decode a video file & simultaneously calculates the optical flow between frames
        :param filename: Filename of the video file that should be decoded
        :param grid_size: Height & width of the calculated optical flow are divided by grid_size, mostly used for optimization. Currently supports 1 & 4 only.
        :param sequence_length: The number of frames returned per batch of decoding (if video length % sequence_length != 0, it will return duplicate / repeated frames, so make sure to filter these away after decoding)
        """
        self.with_of = with_of
        self.filename = filename
        self.pipe = video_pipe(sequence_length=sequence_length, with_of=with_of, goal_shape=goal_shape, batch_size=1, num_threads=1, device_id=0, filename=self.filename, seed=123456)
        self.pipe.build()
        self.epoch_size = self.pipe.reader_meta("Reader")["epoch_size"]
        self.index = 0

    def run(self):
        """
        Runs a single decoding iteration. Will return (None, None) once it has run the required number of iterations to fully decode the video.
        Be aware, if the number of video frames cannot be divided by sequence_length, in the last batch some duplicate / repeated frames will occur
        :return A tuple of the decoded video frames (shape: [sequence_length, width, height, 3]) and optical flow frames (shape: [sequence_length - 1, width / grid_size, height / grid_size, 2])
        """
        # Implementation based on src: https://github.com/NVIDIA/DALI/issues/2760
        if self.index < self.epoch_size:
            pipe_output = self.pipe.run()
            frames = pipe_output[0].as_cpu().as_array()[0]
            if self.with_of:
                of = np.array(pipe_output[1][0].as_cpu())
                return frames, of
            else:
                return frames
        else:
            if self.with_of:
                return None, None
            else:
                return None

Hello Renzo,

thanks for following up here on the forums. I passed your questions on to our experts and hope to hear back from them soon.

Hello Renzo,

We are unable to access the video links you have shared. Can you please check again?

Hi,

Thank you for the reply, and strange to hear the videos are inaccessible.

I have uploaded them elsewhere, please let me know if these are accesible:
Input video
.mov output
.webm output

I believe in my original question, I mixed up the files. So actually the .webm output here shows optical flow of lower quality and the blocky patterns, compared to the .mov output, instead of the other way round.

Thanks!

Thank you. Can you provide direct download links for the .mov and .webm input video files? Taking the input YouTube video will already have resulted into 2 video transcodes; that’s not same as your experiment.

Hi, I understand. I had some troubles uploading directly to this forum post, so I have uploaded them here. I have additionally added output from a .mov format video, so there’s more material to compare. All the flow output videos have the .mp4 file extension, as I was using OpenCV to write the videos, but the differences between the different outputs should still be visible. Please let me know if I need to provide anything else.

Thanks.

Here is a new link to the videos that does not expire.