RTX cards benchmarks - how to compare

bizmate · July 22, 2025, 11:09am

I have looked around and cannot find a definitive lamers simplified comparison on AI specific cards benchmarks. By AI I mean RTX cards that can executed AI specific jobs like running Ollama models, Yolo/Darknet etc and how can these cards be compared when executing such workloads.

It would be great for instance if one could understands, again in the simplest terms possible:

why Nvidia releases new generation cards ie and how they perform… ex. RTX 1060 vs 2060 vs 3060 vs 4060 … 2070 vs 3070 vs … etc
how these performs when using a relatively basic model/workload ie a simple ollama model job
how big of a job/model etc they can perform and how fast, for instance card XYZ can run ollama deepseek-r1:latest and get an answer in 3 seconds … vs deepseek-r1:70b and get an answer in 10 seconds or other parameters if possible.
i have previously looked at sites such as RTX 5070 vs RTX 2070 [6-Benchmark Showdown] but I was told by peers on IRC Libera #hardware that these benchmarks are not reliable. Is there something Nvidia provides to give such comparision before evaluating a card purchase?

Curefab · July 22, 2025, 4:52pm

I do not think there is an official comparison for that specific use case available.

Let’s look at a short list of GPUs:

I would choose a card according to 5 main parameters:

exclude data center GPUs not compatible to your casing or PC architecture
size of GPU memory: the model has to fit
speed of tensor cores
speed of GPU memory
PCIe interface (e.g. 2.0, 3.0, 4.0, 5.0)

(for memory size be careful about few in between GPUs like K80, which comprises of 2 GPUs with 12 GB each, instead of one with 24 GB)

Of course you can also look at

possibility to boost the clock and overclock and also the base clock for long running models
GPU generation and matrix formats. You may use models supporting those
- since Ampere sparse matrices are supported
- since Ada Lovelace 8-bit floating-point is supported
- since Hopper+Blackwell 4-bit and 6-bit floating-point is supported

rs277 · July 22, 2025, 7:21pm

This article may be of interest also.

Topic		Replies	Views
Rtx 2080 ...not the Ti ...Does it have any cappabilities in AI models...i want to buy a jetson but i cant right now System Management and Monitoring (NVML) cuda , conversational-ai	0	822	November 10, 2021
Requesting recommendation on selection between V100 vs T4 vs RTX2080 Ti vs Titan RTX for CUDA programming CUDA Programming and Performance	1	2302	March 5, 2019
Laptop gpu choice CUDA Programming and Performance	3	3242	May 3, 2023
RTX 3070 vs RTX 3070 laptop TensorRT	1	646	December 8, 2021
Performance differences DeepStream SDK tensorrt , camera , ubuntu , gstreamer	3	394	June 21, 2023
Cost effective GPU recommendation for TAO training TAO Toolkit	2	485	July 28, 2022
Best GPU for AI workloads (not DL training) CUDA Programming and Performance	16	5903	April 1, 2021
CUDA vs ATI Stream comparison CUDA Programming and Performance	22	93711	March 12, 2010
Fastest CUDA card on the market choosing best CUDA card for CUDA computation purpose CUDA Programming and Performance	9	9806	July 16, 2011
Better GPU for training & Inference & Execution LLModels TensorRT cudnn	1	511	November 30, 2023

RTX cards benchmarks - how to compare

Related topics