SRT H.264 video source

Hello,

I am currently attempting to compare software- and hardware-based conversion of h.264 → rgb. For that I implemented a FFmpeg-based video source operator which receives a srt stream and decodes the h.264.

Is it possible to profile FFmpeg conversion using nsight systems? The process for the ffmpeg command does not show any traces.

Is there a better way to do this? Is it possible to circumvent the host machine and directly receive the frames on the device?

Thanks in advance

class FFmpegSRTStreamSourceOp(Operator):
        
    def __init__(self, fragment, url, height, width, n_channels, *args, **kwargs):
        self.height = height
        self.width = width
        self.n_channels = n_channels
        self.url = url
        self.buffer_size = self.width*self.height*self.n_channels
        self.ffmpeg_command = ['ffmpeg',
                                '-nostdin',
                                '-max_delay', '0',
                                '-y', '-vsync', '0',
                                '-hwaccel_device', '0',
                                '-hwaccel', 'cuda',
                                '-fflags', 'nobuffer', '-flags', 'low_delay', '-strict', 'experimental',
                                '-i', url,
                                '-pix_fmt', 'rgb24',
                                '-s', f'{width}x{height}',
                                '-vf', 'setpts=0',
                                '-f', 'rawvideo', 'pipe:'
                                ]
        super().__init__(fragment, *args, **kwargs)
        
    def setup(self, spec: OperatorSpec):
        spec.output("source")
        
    def start(self):
        # using subprocess and pipe to fetch frame data
        self.p = subprocess.Popen(self.ffmpeg_command, stdout=subprocess.PIPE, bufsize=10**8)

    @nvtx.annotate("compute", color="green")
    def compute(self, op_input, op_output, context):
        with nvtx.annotate("stdout.read", color="blue"):
            raw_bytes = self.p.stdout.read(self.width*self.height*self.n_channels)
        
        with nvtx.annotate("bytes_to_tensor", color="yellow"):
            tensor = cp.frombuffer(raw_bytes, cp.uint8)

            if tensor.size != self.buffer_size:
                return
            
            tensor = tensor.reshape(self.height, self.width, self.n_channels)
            
        entity = Entity(context)
        entity.add(hs.as_tensor(tensor))
        op_output.emit(entity, "source")

    def stop(self):
        self.p.kill()
        return super().stop()

When I was reproducing the ‘multi-endoscopy app’ demo, I also had a similar need: output the real-time video stream which is captured from a usb video camera to the input of the ‘ai-model’. In this step, the problem of ‘yuyv cannot be converted’ occurred. I also want to learn from your idea and use a piece of code to adapt the video stream to a suitable RGB format.
I would like to ask the author, have you solved this problem? If successful, please share how you solved it?

Feel free to use parts of my operator for your use case. I noticed however that the hardware acceleration is not even used for decoding h.264 in the example. I forgot to add the -c:v h264_cuvid flag. But there is not a noticeable difference between cpu decoding and gpu decoding.

Do you know by any chance if it is possible to keep the decoded frames on the gpu using -hwaccel_output_format cuda and converting them directly to cupy array?

Regarding your issue with the color space conversion, I never came across this problem.