Different memory consumption on Nvidia A100 and Nvidia T4

Hi,

The same RTSP (1280 x 720 @ 5 FPS ) stream consumes different levels of RAM on A100 and T4.

I’m using Nvidia VideoProcessingFramework example SampleDecodeRTSP.py.

A100 40gb → 480MiB per stream
T4 16gb → 148MiB per stream
RTX 2070 super 8gb → 150MiB per stream
GTX 1660 super 6gb → 120MiB per stream

I started doing the calculation on top of an nvidia T4, which occupies about 148Mb RAM per stream and I extrapolated to the 40Gb of Nvidia A100, but when I went to test in A100 I realized that the VRAM occupation is 4x higher (480MiB)

I’m trying to calculate how many GPUs I’ll need to decode 3.600 cameras and with these calculations I can’t do more than 80 simultaneous rtsp streams on the Nvidia A100.

Is this behavior expected?

Regards,
Kevin

Fri May 19 17:41:19 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 52C P0 27W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 |
| N/A 56C P0 28W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 2 Tesla T4 Off | 00000000:00:06.0 Off | 0 |
| N/A 54C P0 27W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 3 Tesla T4 Off | 00000000:00:07.0 Off | 0 |
| N/A 52C P0 26W / 70W | 0MiB / 15109MiB | 5% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

###########################################################################

Fri May 19 14:40:44 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-PCI… On | 00000000:CA:00.0 Off | 0 |
| N/A 26C P0 32W / 250W | 4MiB / 40960MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2276 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+

Hi there @kevin.zezel and welcome to the NVIDIA developer forums!

I can’t say if the memory consumption you are seeing is expected or caused by implementation details of these samples.

But in terms of performance expectations I recommend reading the NVDEC_Application_Note.pdf, which is part of the Video SDK download There is a section on performance in different Chip generations.

I hope that helps!