Different memory consumption on Nvidia A100 and Nvidia T4

kevin.zezel · May 19, 2023, 5:41pm

Hi,

The same RTSP (1280 x 720 @ 5 FPS ) stream consumes different levels of RAM on A100 and T4.

I’m using Nvidia VideoProcessingFramework example SampleDecodeRTSP.py.

NVIDIA/VideoProcessingFramework/blob/master/samples/SampleDecodeRTSP.py

#
# Copyright 2019 NVIDIA Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Starting from Python 3.8 DLL search policy has changed.
# We need to add path to CUDA DLLs explicitly.
import multiprocessing
import sys

This file has been truncated. show original

A100 40gb → 480MiB per stream
T4 16gb → 148MiB per stream
RTX 2070 super 8gb → 150MiB per stream
GTX 1660 super 6gb → 120MiB per stream

I started doing the calculation on top of an nvidia T4, which occupies about 148Mb RAM per stream and I extrapolated to the 40Gb of Nvidia A100, but when I went to test in A100 I realized that the VRAM occupation is 4x higher (480MiB)

I’m trying to calculate how many GPUs I’ll need to decode 3.600 cameras and with these calculations I can’t do more than 80 simultaneous rtsp streams on the Nvidia A100.

Is this behavior expected?

Regards,
Kevin

Fri May 19 17:41:19 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 52C P0 27W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 |
| N/A 56C P0 28W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 2 Tesla T4 Off | 00000000:00:06.0 Off | 0 |
| N/A 54C P0 27W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 3 Tesla T4 Off | 00000000:00:07.0 Off | 0 |
| N/A 52C P0 26W / 70W | 0MiB / 15109MiB | 5% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

###########################################################################

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2276 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+

MarkusHoHo · May 30, 2023, 1:12pm

Hi there @kevin.zezel and welcome to the NVIDIA developer forums!

I can’t say if the memory consumption you are seeing is expected or caused by implementation details of these samples.

But in terms of performance expectations I recommend reading the NVDEC_Application_Note.pdf, which is part of the Video SDK download There is a section on performance in different Chip generations.

I hope that helps!

Topic		Replies	Views
A10 GPU using more GPU RAM than T4 GPU for inference using PyTorch TensorRT model CUDA NVCC Compiler tensorrt , camera , cuda , ubuntu	0	1194	December 2, 2021
Nvv4l2decoder memory usage DeepStream SDK gstreamer , gpu , deepstream	4	833	April 21, 2022
High GPU memory consumption when decoding RTSP video stream Video Processing & Optical Flow rtsp , decoder , ffmpeg , video	2	862	August 22, 2023
V100 vs T4, NVDEC, number of streams, sizes Video Processing & Optical Flow	1	3801	May 13, 2020
Unexpected memory usage during video transcoding on NVIDIA Tesla V100 Video Processing & Optical Flow	2	1061	February 10, 2020
Video memory consumption larger on more recent Quadro generations Linux cuda , encoder	0	428	May 6, 2022
How many H265 video streams can be decoded by a RTX A2000 12GB? DeepStream SDK	3	694	September 10, 2024
Progressive RAM Usage Increase in DeepStream 7.1 with 40 RTSP Streams on Kubernetes DeepStream SDK camera , deepstream	10	230	November 13, 2025
Performance on T4 is over 3x slower than that on 2080Ti on DeepStream5 DeepStream SDK gstreamer	7	1899	October 12, 2021
Maximum FPS of decoder DeepStream SDK	2	1395	June 27, 2023

Different memory consumption on Nvidia A100 and Nvidia T4

Related topics