Titan RTX and Titan V

I primarily do numerical work with nvidia GPUs. Can someone give information on the relative performance of the Titan RTX and Titan V. What are the advantages and disadvantages for each. Are there specific applications where one should be preferred over the other? Based purely on price it would seem that the Titan V should be preferred for compute, but the 24 GB of memory on the RTX is also compelling.

I have not used either of these, but when soliciting this kind of feedback, it usually makes sense to name specific applications you intend to run. “numerical work” is quite vague. Have you profiled your applications? What are the typical bottlenecks you have encountered? What GPU do you currently use?

Are your applications limited by memory size, memory throughput, or computation? If the latter, is the computation mostly floating-point or integer? If floating-point: mostly double precision, single precision, half precision?

Sorry For not clarifying.

In my application we are mostly working with FP32. For my application memory throughput and computation are both important. So I guess I wanted to see if the Titan V is better in both of those categories. Sometimes I can be constrained by the memory size, but that is not most important to me.

I currently have a Titan V, but wanted to evaluate the RTX for use in other workstations.

TitanV will be a clear win when your application is bound by FP64 throughput. Unless you are doing a lot of double-precision matrix multiply work, that may not be a likely description of your use case. And as you’ve already pointed out, Titan RTX, at 24GB, has twice the memory of Titan V. In most other respects the two cards should be roughly comparable. (Another clear differentiator would be if you are interested in ray-tracing performance, using either a industry ray-tracing API, or Optix. In that case, Titan RTX is the clear win.)

Top level specs comparison (peak theoretical):

.        Titan RTX     Titan V           
FP64:       0.5          7.5          TFlop/s (approximate - based on boost clock)
FP32:        16           15          TFlop/s (approximate - based on boost clock)
FP16:        32           30          TFlop/s (approximate - based on boost clock)
TCU:        576          640          TensorCore Units
TCFP:         *          110          TFlop/s mixed precision TensorCore throughput
MemBW:      672          652          GB/s
MemSz:       24           12          GB

The data presented here are assembled from other internet resources and may contain errors. This is not a statement of specification from NVIDIA. You should independently confirm any expected/desired characteristics prior to making any buying decisions. The numbers indicated above are peak theoretical calculated numbers for relative comparison purposes only. They are generally not achievable using real-world benchmarking.

https://www.techpowerup.com/gpu-specs/titan-rtx.c3311
https://www.nvidia.com/en-us/titan/titan-v/#

  • Turing TensorCore capability varies substantially based on calculation mode. Base-level mode (FP16 precision) should be comparable to Volta TC per unit at constant clocks. For description of other modes, refer to the turing white paper blog:

https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/

If you plan to do an evaluation anyhow, I would suggest to just get a Titan RTX then and benchmark it against your existing Titan V with the apps you care about.

From general information on the internet, I would not expect a massive performance boost from the RTX, more like improvements in the 5% to 10% range, as the specs (FP32 and memory throughput) of the two GPUs are quite close. You get more memory with the RTX for which you pay with higher power consumption (280W vs 250W).

This is not very surprising, as both GPUs appear to be manufactured in the same process, and are among the largest devices made in that process, at around 20 billion transistors.

@Robert_Crovella

I understand there isn’t really a way to know for sure unless I benchmark it, but according to your post it seems that the only reason to choose the Titan V is for FP64 support and Tensor Cores. Otherwise the Titan RTX wins across the board (if only by a little). Memory capacity, memory throughput, FP32 performance, FP16 performance, and price are all better. Is that a valid assessment?

Generally speaking, yes. But the relative performance for a given application may vary a bit more than the papers specs suggest. As for the tensor cores, it seems to me the Titan V and Titan RTX are on par (RTX has fewer units but is running them at higher clocks).

The two GPUs use different kinds of memory with different performance properties (access width, latency), so the performance of memory intensive code may not correlate directly with the specifications for theoretical memory throughput, depending on the application’s specific access pattern.

The differences in power consumption and cooling design (blower vs open fan) may result in different clock boosting behavior in a given enclosure. Open fan designs dissipate most of the heat into the case, while NVIDIA’s previous blower designs exhaust much of it through vents in the bracket. I read that the open fan design results in superior cooling in a well-ventilated case.

Can you expand on the differences in the memory HBM2 vs DDR6. I see that memory interface is listed as 3072-bit and 384-bit respectively. What does that mean in terms of performance? Does one type of memory have higher latencies than the other? I don’t have a good feel for how to compare these things.

Sorry, my in-depth understanding of DRAM technology stopped at early DDR devices. Practical performance of any kind of DRAM will also depend on the specifics of the memory controller, and NVIDIA does not make those available. FWIW the Titan RTX uses GDDR6, which presumably enables higher capacity at lower cost (note the “G”; for regular DDR we are still at DDR4 at this time).

The way I look at it is that the theoretical memory bandwidth specifications suggest that the Titan RTX will be slightly faster than the Titan V on many memory-bound applications, but one could also encounter cases where it is slower than a Titan V. Especially when the hardware specs are close, it is not possible to say where in that spectrum a given application will show up.

Performance is also influenced by the code generated by the compiler. I haven’t looked how much variation their is between code generated for Volta vs code generated for Turing, but would expect some differences as GPU architectures are not binary compatible and code generators are re-tweaked for new architectures.

This is why it is important to benchmark one’s actual applications if one desires to maximize their performance.

I didn’t list TensorCore in my top-level list of differentiators. The ones I listed were: FP64, memory size, and Ray Tracing. These are arguably all capability differentiators. If your code does large amounts of FP64 matrix multiply, there’s really no comparison between the two cards. If your code must run a single deep learning training scenario on a single GPU, and the model will not fit in 12GB (but will fit in 24GB), there’s really no comparison between the two cards. If your work involves modern-API-based raytracing, there’s really no comparison between the two cards.

I concur with njuffa. I personally don’t think Tensor Core is a reason to choose TitanV (over Titan RTX), and I’m not sure why you reached that conclusion. If you’re basing a buying decision on the possibility that TitanV TC performance might be 10% higher than Titan RTX according to some measurement, then go for it. (I’m not sure that TitanV TC performance is higher under any circumstances than Titan RTX, and I’m quite sure that under some - arguably relatively obscure - circumstances Titan RTX TC performance could be almost infinitely higher than Titan V.)

Again, to concur with njuffa, the best basis for evaluation is benchmarks representative of workloads you care about. If you fully understand the comparison of Titan RTX TensorCore vs. Titan V TensorCore on your workloads, then you have far more information and far better basis for evaluation than I could possibly provide in a forum posting like this.

On paper, the TC performance between the two appears to me to be pretty close, at least for some workloads. I left the TC performance entry for Titan RTX blank for a reason.

What I’m stating here are my own personal viewpoints. These should not be considered to be formal recommendations from NVIDIA. I take no responsibility for your buying decisions. This information is offered on a best-efforts basis with no intent of warranty, nor any statement of suitability for a particular purpose.

This statement confuses me. I did a google search just now. On Amazon, Titan RTX appears to be available for $2500 whereas Titan V appears to be available for $3300. Maybe we are looking at different prices or I just don’t understand your comparison.

Sorry for the confusion.

I based my tensor core statement based purely on the number of cores and the absence of a number for TCFP on the RTX. I see that was the wrong assumption on my part.

My statement that “Based purely on price it would seem that the Titan V should be preferred for compute” is a bit confusing, I apologize. I was just relating that usually the more expensive price would indicate better performance. Which is true in some cases, but it seems to be more of a capability differentiation than anything between these two cards.

I appreciate all the help!

Historically, NVIDIA has tended to charge more for the “high double-precision throughput” capability as a form of market differentiation. The Titan V has that feature, the Titan RTX does not.

While pricing is typically determined by what the market will bear rather than manufacturing costs (which set a lower limit), I note that Titan V has the larger die and also uses an expensive HBM2 stack, so presumably has higher manufacturing costs.

Under those considerations it seems to make sense that I see Titan V listed around $3000 while I see Titan RTX listed around $2500.

Some initial TensorFlow benchmark results showing performance of the Titan RTX vs the Titan V and other high-end GPUs can be found here:

https://lambdalabs.com/blog/titan-rtx-tensorflow-benchmarks/

I am not familiar with that site and their benchmarking procedures, so I am just putting the link here and leave people to draw their own conclusions.

Probably most folks know this, but I’ll mention that FP32 training does not involve TensorCore usage, typically, and FP16 training does involve TensorCore usage, typically, on GPUs that have TensorCore support.

I also don’t know about the methodology used there. The barchart data and raw results seem plausible to me, but the percentages listed don’t seem to match the barchart data, by my way of thinking. It may be a language quibble.

As an example, for FP16, I don’t see how the Titan RTX could be “209.7% faster than the GTX 1080 Ti” and also “21.4% faster than the RTX 2080 Ti”

Alternative wordings that make more sense to me:

“109.7% faster than the GTX 1080 Ti”
“21.4% faster than the RTX 2080 Ti”
(preferred)

or:

“209.7% faster than the GTX 1080 Ti”
“121.4% faster than the RTX 2080 Ti”

If people would just stick to speed-up factors, we’d all be better off. Usually confusion sets in when combining “faster” and “percent”. A GPU A that offers 2x the performance of GPU B is how many percent faster? Discuss. :-) Next up: What’s the correct way of averaging ratios?

That said, the raw data in the tables looks plausible to me, in the absence of any hands-on experience benchmarking deep-learning apps.

While Titan V looks like the winner for fp64, ergo if you’re doing HPC Titan V is your choice, you may want to expand your choices into quadro. For “mission critical” HPC where double precision is a must then ECC should also be a must. No Titan card comes equipped with error correcting memory. There’s been a big debate about the benefit of ECC for workstations using CPU only computation. Probably you have a lot of users on AutoCAD, Solidworks or some other professional design package asking themselves: “do I really need Xeon / Quadro / ECC level of capability” - the answer is probably “no”.

This is different than if you are doing REAL critical double precision computations where an error could be disaster: Flight control, CFD in design of turbines, wings or other aviation, high speed trading of billions of $$$, computational biology, etc, etc, etc. In that case you HAVE to spend the $$$ on Quadro or suffer the consequences.

With this disclaimer in mind, the clear choice become Titan RTX with 24 gb of dram. Given that neural nets are probablistic anyway, the very odd data error simply disappears in the normal distribution of deep learning. Hence - training in 16 bit vs 32 bit.

HPC != Deep Learning

as you are aware, the performance is strongly dependent on application (type of compute, memory vs compute bound) and implementation.

Just one sample point for my application (Monte Carlo photon transport), here is a post I made last Friday

https://groups.google.com/d/msg/mcx-users/PWa3B-7uwGE/1sFnpmbbAwAJ

the RTX 2080 card is about 20% slower than TitanV, but 50% faster than 1080Ti. Please see the benchmark results page at

http://mcx.space/gpubench/

and search for “2019-2-22” for the newly added tests.

I do think the “50% faster than 1080Ti” is overrated because 1080Ti used to be fast (for nvidia drivers 375-381, and cuda 7.5 or earlier), but got a drop in speed as the cuda compiler/driver updates. This was first observed in 2016, but we did not dig into this, so the speed drop stayed until today

https://devtalk.nvidia.com/default/topic/925630/cuda-programming-and-performance/cuda-7-5-on-maxwell-980ti-drops-performance-by-10x-versus-cuda-7-0-and-6-5/post/4994759/#4994759

Hi, thanks for all the responses. It seems what I really need is a Quadro with FP64 double precision as we just need high capacity computation so the Quadro GP100 fits the bill. Thing is the cost of this is 2 to 3 times the Titan V which also achieves high FP64 at 7 Tflops. The Titan V is out of stock at the UK store but I have seen some used ones on sale on ebay. I just need to look for a suitable server chassis that will drive the 250W it requires.