Capturing RTSP frames using FFmpeg with hardware acceleration

GPU - GTX 1050, 3GB
CPU - Corei5 9thGen, 16GB RAM, Thread(s) per core: 2, Core(s) per socket: 4
Stream -
Resolution - 1920x1080
FPS - 15fps
Bitrate - 1536kbps

I’m trying to encode and decode rtsp streams using Video Code SDK .

There are 2 combinations -

1. Encode on CPU and Decode on GPU
Works fine but takes a lot of CPU usage.
Pipeline -
ffmpeg_cmd = [“ffmpeg”, “-y”,“-vsync”, “0”, “-hwaccel_output_format”,“cuvid”, “-c:v”, “h264_cuvid”, “-i”,RTSP_STREAM,
“-f”,“rawvideo”, “-c:v”,“h264_nvenc”, “-pix_fmt”,“yuv420p”, “-”]

2. Encode and decode on GPU
Low CPU Usage but I receive bad noisy frames.
Pipeline-
ffmpeg_cmd = [“ffmpeg”, “-y”,“-vsync”, “0”, “-hwaccel_output_format”,“cuvid”,“-c:v”, “h264_cuvid”, “-i”,RTSP_STREAM, “-f”,“rawvideo”, “-pix_fmt”,“yuv420p”, “-”]

One thing I observed, when I work with CPU encoding, for -

x = ffmpeg.stdout.read(int(w* h *6//4))

x grabs data for each frame and then moves on to the next steps in converting from yuv_2_rgb.
However, in GPU streaming, the pipeline takes bad data after 300-500 frames

CODE -

import subprocess as sp
import numpy as np
import cv2


def yuv_2_rgb_math(ffmpeg):
	w = 1920
	h = 1080
	k = w*h
	ys = []
	us = []
	vs = []
	x = ffmpeg.stdout.read(int(w*h*6//4))  # read bytes of single frames
	y =  np.frombuffer(x[0:k], dtype=np.uint8).reshape((h, w))
	u =  np.frombuffer(x[k:k+k//4], dtype=np.uint8).reshape((h//2, w//2))
	v =  np.frombuffer(x[k+k//4:], dtype=np.uint8).reshape((h//2, w//2))
	u = np.reshape(cv2.resize(np.expand_dims(u, -1), (w, h)), (h, w))
	v = np.reshape(cv2.resize(np.expand_dims(v, -1), (w, h)), (h, w))
	ys.append(y)
	us.append(u)
	vs.append(v)
	y = np.array(ys, dtype=np.float32)
	u = np.array(us, dtype=np.float32)
	v = np.array(vs, dtype=np.float32)
	r = y+1.371*(v-128)
	g = y+0.338* (u-128)-0.698*(v-128)
	b = y+1.732*(u-128)
	result = np.stack([b, g, r], axis=-1)
	result = np.clip(result, 0, 255)
	return result.astype(np.uint8)

def main(ffmpeg_cmd):
	ffmpeg = sp.Popen(ffmpeg_cmd, stdout=sp.PIPE, bufsize=10)
	while True:
		image = yuv_2_rgb_math(ffmpeg,count)
		image = np.concatenate(image, axis=0) 
		cv2.imwrite('Input Feed.jpg', image)

Have you solved this problem (x grabs data for each frame and then moves on to the next steps in converting from yuv_2_rgb.
However, in GPU streaming, the pipeline takes bad data after 300-500 frames? )I also encountered the same problem.