Processing camera samples from RTSP to Redis with ffmpeg

FrancoSelleski · May 6, 2022, 8:20pm

We are currently working on capturing images from cameras using an RTSP stream and storing them in a Redis database.
With up to 6 cameras 4MP @ 10fps, everything runs smoothly and we get 60 fps at our Redis database. But, when we try and add 9 cameras we see a decrease in fps (they remain at 60 fps when we expected 90 fps, 10 fps for every camera).
Our network settings are ok, we can do this same task with CPU processing (cv2.videocapture) up to 15 cameras and it works fine but it consumes many CPU resources and we think we could manage our resources better by processing this in GPU.
Our camera settings are:
h.265 codec
2688x1520
10 fps
6144 bit rate
We run this on 1 RTX 3090.
We use compiled FFmpeg and hevc_cuvid. Here’s the code we are running:

pipe = sp.Popen(
            [
                "ffmpeg",
                "-y", 
                "-loglevel",
                "error",
                "-vsync",
                "0",
                "-c:v",
                "hevc_cuvid",
                "-rtsp_transport",
                "tcp",
                "-i",
                config["device_url"], 
                "-preset",
                "superfast",
                "-pix_fmt",
                "bgr24",
                "-f",
                "rawvideo",
                "-",
            ],
            stdout=sp.PIPE,
            bufsize=bufsize,
        )
while True:
    pipe_content = pipe.stdout.read(bufsize)
    if len(pipe_content) > 1:  
        self.store_frame(
            str(config["id"]),
            numpy.frombuffer(pipe_content, dtype="uint8").reshape(
                (image_y, image_x, 3)
            ),
        )

        pipe.stdout.flush()

dsingalNV · May 25, 2022, 10:23am

Hi Franco,
There are some reasons you might be seeing lower performance than expected. I’ll address the three that are most probable.

Number of NDEC chips: a 3090, being a consumer class card has only 1 NDEC chip capable of 1 decode session, as opposed something like an A16, which can host 8 concurrent sessions, using 4 chips capable of 2 sessions each.
Performance penalty of copying data between GPU memory and system memory via the PCIe interface: If you add the following commands to your ffmpeg encode pipeline : -hwaccel cuda -hwaccel_output_format cuda eg: ffmpeg -hwaccel cuda -hwaccel_output_format cuda -c:v hevc_cuvid -i output.mp4 -pix_fmt bgr24 -benchmark -f null -
the decoded raw frames would be copied back to system memory via the PCIe bus. Since you’re using streaming in data, you could be saturating the PCIe bandwidth copying frames to the GPU, decoding, and then sending them out to be read via numpy.
Numpy: Since numpy uses system memory, using something like cupy that can access the data on GPU memory itself would reduce the memory copy from GPU memory to system memory.

For future reference, our technical blog on GPU accelerated transcoding along with the the technical documentation are great resources:
https://developer.nvidia.com/blog/nvidia-ffmpeg-transcoding-guide/
https://docs.nvidia.com/video-technologies/video-codec-sdk/ffmpeg-with-nvidia-gpu/

Topic		Replies	Views
Capturing RTSP frames using FFmpeg with hardware acceleration System Management and Monitoring (NVML) rtsp , cuda , encoder , python , ffmpeg	1	3858	August 4, 2022
High GPU memory consumption when decoding RTSP video stream Video Processing & Optical Flow rtsp , decoder , ffmpeg , video	2	802	August 22, 2023
RTSP vs video performance mismatch DeepStream SDK	4	365	February 27, 2024
Low FPS, randomness RTSP Stream DeepStream SDK	12	1034	July 20, 2022
High CPU usage rate for pulling RTSP stream by VideoCapture with CAP_GSTREAMER Jetson AGX Xavier rtsp	11	3326	December 20, 2021
Ffmpeg decoding multiple H264 video streams takes too much video RAM Video Processing & Optical Flow ffmpeg	3	122	September 13, 2024
How to encode gpu images and push to rtsp with ffmepg Video Processing & Optical Flow	0	217	March 27, 2024
the cpu usage cannot down (use cuda decode) Jetson TX1	29	11594	October 18, 2021
the most efficient camera recording scheme Jetson TX1	5	810	October 18, 2021
The number of cameras that DeepStream can support DeepStream SDK tensorrt , camera , gstreamer	3	705	December 15, 2023

Processing camera samples from RTSP to Redis with ffmpeg

Related topics