Hi!
I make use of Nvidia DALI to decompress videos and to calculate optical flow between video frames. My video decoding pipeline is very similar to the examples provided by the DALI docs.
I had some questions I asked the DALI devs, but was redirected to this forum (original issue).
I have noticed the quality of the returned optical flow highly differs based on the input video type. I have provided my sample input video I converted to .mov and .webm format and the calculated optical flow output below:
Input video
Flow output .mov file
Flow output .webm file
As you can see, the generated flow output differs highly between the two file formats & both results contain quite a lot of noise. I was wondering what the cause of difference in quality is. Does the generated optical flow quality depend this strongly on the input video type? If so, what would be the most ideal video type to use to generate high-quality optical flow, or how can I improve the quality of the returned optical flow frames?
I hope switching to a different file type is not necessary, as I will be expecting .mov files as input to my pipeline. Many thanks in advance!
Best regards,
Renzo
EDIT: As a follow-up question, would there be a difference in using RGB (or BGR) video frames instead of grayscale video frames in the resulting optical flow frames? For instance, some of OpenCV’s OF implementations require you to convert your input to grayscale, while DALI also allows RGB video frames.
Source code of my current video decoding pipeline:
import numpy as np
import nvidia.dali.fn as fn
from nvidia.dali import pipeline_def
import nvidia.dali.types as types
batch_size = 1
# Nvidia OF presets that determine the speed/quality trade-off of the output optical flow
of_presets = {
"slow": 0.0,
"medium": 0.5,
"fast": 1.0
}
@pipeline_def
def video_pipe(filename: str, sequence_length: int, with_of: bool, goal_shape: (int, int), preset: str = "slow"):
"""
Pipeline that decodes a single video & simultaneously calculates the optical flow
:param with_of: Whether the video pipe should also return optical flow
:param goal_shape: Goal shape of the optical flow frames
:param filename: Filename of the video file that should be decoded
:param sequence_length: The number of frames returned per batch of decoding (if video length % sequence_length != 0, it will return duplicate / repeated frames, so make sure to filter these away after decoding)
:param grid_size: Height & width of the calculated optical flow are divided by grid_size, mostly used for optimization. Currently supports 1 & 4 only.
:param preset: The speed and quality level of the optical flow calculations
:return A tuple of the decoded video frames (shape: [sequence_length, width, height, 3]) and optical flow frames (shape: [sequence_length - 1, width / grid_size, height / grid_size, 2])
"""
video = fn.readers.video(device="gpu", filenames=filename, sequence_length=sequence_length,
shard_id=0, num_shards=1, random_shuffle=False, initial_fill=batch_size,
skip_vfr_check=True, pad_sequences=True, name="Reader", file_list_include_preceding_frame=True)
# gray = fn.color_space_conversion(video, image_type=types.DALIImageType.RGB, output_type=types.DALIImageType.GRAY)
video_bgr = fn.color_space_conversion(video, image_type=types.DALIImageType.RGB, output_type=types.DALIImageType.BGR)
if with_of:
of = fn.optical_flow(video, output_format=4, preset=of_presets[preset], enable_temporal_hints=True)
of = fn.resize(of, resize_x=goal_shape[0], resize_y=goal_shape[1])
return video_bgr, of
else:
return video_bgr
class VideoDecoder:
def __init__(self, filename: str, with_of: bool, goal_shape: (int, int) = (None, None), sequence_length: int = 10):
"""
Class that contains functions to decode a video file & simultaneously calculates the optical flow between frames
:param filename: Filename of the video file that should be decoded
:param grid_size: Height & width of the calculated optical flow are divided by grid_size, mostly used for optimization. Currently supports 1 & 4 only.
:param sequence_length: The number of frames returned per batch of decoding (if video length % sequence_length != 0, it will return duplicate / repeated frames, so make sure to filter these away after decoding)
"""
self.with_of = with_of
self.filename = filename
self.pipe = video_pipe(sequence_length=sequence_length, with_of=with_of, goal_shape=goal_shape, batch_size=1, num_threads=1, device_id=0, filename=self.filename, seed=123456)
self.pipe.build()
self.epoch_size = self.pipe.reader_meta("Reader")["epoch_size"]
self.index = 0
def run(self):
"""
Runs a single decoding iteration. Will return (None, None) once it has run the required number of iterations to fully decode the video.
Be aware, if the number of video frames cannot be divided by sequence_length, in the last batch some duplicate / repeated frames will occur
:return A tuple of the decoded video frames (shape: [sequence_length, width, height, 3]) and optical flow frames (shape: [sequence_length - 1, width / grid_size, height / grid_size, 2])
"""
# Implementation based on src: https://github.com/NVIDIA/DALI/issues/2760
if self.index < self.epoch_size:
pipe_output = self.pipe.run()
frames = pipe_output[0].as_cpu().as_array()[0]
if self.with_of:
of = np.array(pipe_output[1][0].as_cpu())
return frames, of
else:
return frames
else:
if self.with_of:
return None, None
else:
return None