Performance Differences between RTX 3080 and Nvidia T4 GPUs in a DeepStream Application

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.1.1
• TensorRT Version Latest ngc cloud deepstream triton image
• NVIDIA GPU Driver Version (valid for GPU only) 515
• Issue Type( questions, new requirements, bugs) Questions

I currently have two computers where I’m running my software. One at home with an rtx 3080 which is where I develop and another one in a “production” server which has 2 Nvidia T4s.

I have been doing some benchmarks of a yolo 7 engine generated with trtexec for both computers and I’m seeing some odd numbers.

In the 3080 I’m able to get 581 fps with a batch of 8, while in a single T4 I’m only being able to get 172 fps with the same batch.

This is for me counter intuitive, since T4s are cards that are supposed to go in data centers and servers, and also way more expensive. Is there a reasoning behind this? Or does 3080 make more sense for my use case?

Forgot to mention the engine is FP16

Hi @madisi98
From Tesla T4 vs GeForce RTX 3080 [in 3 benchmarks] , RTX 3080 is Ampere GPU arch, while T4 is Turing GPU arch. RTX 3080 has more CUDA Cores/Tensor Cores, and RTX 3080 has higher GPU clock.
So, I think it’s expected that RTX 3080 has higher tops than T4.
But T4 is data center card, it has longer lifetime and ECC feature.

Right, I see also the 3080 can only do 3 h264 encodings at a given time while the T4 has an unrestricted amount of encodings that can be done.

Why is it this way if the 3080 is a more capable card?

Sorry, I don’t get your question

I don’t get why the T4 can do more video encodings at once than the 3080, since the later is capped at 3

Here is the info - NVENC Application Note :: NVIDIA Video Codec SDK Documentation