RTX cards benchmarks - how to compare

I have looked around and cannot find a definitive lamers simplified comparison on AI specific cards benchmarks. By AI I mean RTX cards that can executed AI specific jobs like running Ollama models, Yolo/Darknet etc and how can these cards be compared when executing such workloads.

It would be great for instance if one could understands, again in the simplest terms possible:

  • why Nvidia releases new generation cards ie and how they perform… ex. RTX 1060 vs 2060 vs 3060 vs 4060 … 2070 vs 3070 vs … etc
  • how these performs when using a relatively basic model/workload ie a simple ollama model job
  • how big of a job/model etc they can perform and how fast, for instance card XYZ can run ollama deepseek-r1:latest and get an answer in 3 seconds … vs deepseek-r1:70b and get an answer in 10 seconds or other parameters if possible.
  • i have previously looked at sites such as RTX 5070 vs RTX 2070 [6-Benchmark Showdown] but I was told by peers on IRC Libera #hardware that these benchmarks are not reliable. Is there something Nvidia provides to give such comparision before evaluating a card purchase?

I do not think there is an official comparison for that specific use case available.

Let’s look at a short list of GPUs:

I would choose a card according to 5 main parameters:

  • exclude data center GPUs not compatible to your casing or PC architecture
  • size of GPU memory: the model has to fit
  • speed of tensor cores
  • speed of GPU memory
  • PCIe interface (e.g. 2.0, 3.0, 4.0, 5.0)

(for memory size be careful about few in between GPUs like K80, which comprises of 2 GPUs with 12 GB each, instead of one with 24 GB)

Of course you can also look at

  • possibility to boost the clock and overclock and also the base clock for long running models
  • GPU generation and matrix formats. You may use models supporting those
    • since Ampere sparse matrices are supported
    • since Ada Lovelace 8-bit floating-point is supported
    • since Hopper+Blackwell 4-bit and 6-bit floating-point is supported

This article may be of interest also.