COMPARING A 300 WATT TITAN with a 85 WATT IVY BRIDGE PROCESSOR IS BS.

Skybuck · July 26, 2013, 6:51am

AGREE DISAGREE LET ME KNOW YOUR OPPINION.

pasoleatis · July 26, 2013, 10:08am

You can have performance per $ or performance per Watt and you compare for the application you need to do your job. So your title is BS and using caps is impolite.

seibert · July 26, 2013, 11:07am

Agreed. Compare using the magic of multiplication. :)

Equal power: Is a Titan better than 3.5 Ivy Bridge processors for your application?
Equal cost: Is a Titan better than 3 or 4 Ivy Bridge processors for your application?

pasoleatis · July 26, 2013, 11:13am

Plus, in some clusters 1 hour of 1 cpu core is billed the same as 1 hour of gpu.

CudaaduC · July 26, 2013, 6:21pm

Tell you what, you get together the equivalent amount of CPUs to equal the Titan in Wattage, and you and I can have a competition to see who has the faster code.

We can use three metrics for testing, sorting 100 million floats, multiplying large dense matrices (lets say 10,000 x 10,000 dense floats) and a graph algorithm like BFS or Floyd-Warshall.

We can have a Google hangout and see who can run the faster correct code. What do you say SkybuK?

njuffa · July 26, 2013, 7:43pm

I will note that the GTX Titan is rated at 250W and that it includes 6 GB of high-speed memory along with the processor:

[url]http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan/specifications[/url]

pasoleatis · July 26, 2013, 8:11pm

I already have a code which relies heavily on fft. I tried tets up to 800x800 matrices against a mpi version and 1 Titan 2070 was equivalent to 60-80 cores. Taking 12 core per cpu (AMD cpu) that means about 6-7 cpu with 90 W per cpu. If you consider the electricity spend by the infiniband as well.
I think both performance per Watt and performance per dollar are better in the Titan. I can do my job much better even with limited amount gpus at my disposal and most of my work is done now with CUDA. I still have programs which run on cpu it just depends on the problem. I am still using my mpi cpu code for problems which do not fit in 6 GB of RAM.

CudaaduC · July 26, 2013, 10:04pm

Also, while I know most on here do not play PC games, keep in mind that the Titan is really awesome for that purpose.

So in other words you can do at least 1.2 teraflops Sgemm() with the Titan, bitcoin mine at 380-450 MHS, then play Skyrim or FarCry 3 on maximum settings and maximum resolution with a FPS>60.

SkyBuk try playing FarCry3 using a CPU, and let me know how that goes.

njuffa · July 26, 2013, 10:55pm

The cited SGEMM performance number for GTX Titan looks much too low, but I don’t have one to do a quick test. Could you double check your data, please?

CudaaduC · July 27, 2013, 12:28am

njuffa,

That was just a minimum guess for the Titan, as my current PC has a GTX 680 and Tesla K20c.

The GTX 680 Sgemm() is at about 1 teraflop and the K20c is about 1.3 teraflops for Sgemm().

Since I know that the Titan and the K20 have similar performance I low-balled the teraflop estimate by using my worst-case K20 numbers.

Those are low numbers because I have an old motherboard and am only get the half the bandwidth speed (PCI-e 2.0 x8), but that will be replaced soon.

The above numbers are from the CUDA 5.0 SDK cuBLAS MatrixMult sample.

If I use CUDA-Z utility I get 2.2 teraflops for the K20c single precision, and 1.24 teraflops for K20c double precision.

for the GTX 680 using CUDA-Z utility I get 1.98 teraflops for single precision, and 200 Gflops for double precision. The 680 is used for games and video out, so I never need it for the double capability.

njuffa · July 27, 2013, 2:42am

The CUDA SDK sample apps are generally not designed for benchmarking purposes.

The host platform shouldn’t matter when measuring CUBLAS GEMM performance (my workstation here is five years old, obviously limited to PCI-e 2). I have a Tesla K20c. Using CUDA 5.5 and with dimensions of m=n=k=8192 I see the following SGEMM performance:

2650 GFLOPS transpose_a=N transpose_b=N (0.415 sec)
2670 GFLOPS transpose_a=N transpose_B=T (0.412 sec)
2180 GFLOPS transpose_a=T transpose_B=N (0.497 sec)
2210 GFLOPS transpose_a=T transpose_B=T (0.496 sec)

The above execution times are for the CUBLAS call followed by cudaThreadSynchronize(), as seen by the host code (i.e. all data is resident on the GPU and there are no copies). The GFLOPS numbers are based on the standard count of floating-point operations as 2MN*K for GEMM. Obviously the performance will vary considerably with the three dimensions, but for performance comparisons it is customary to state the performance for large, square, matrices as I have done here. I do not have CUDA 5.0 ready to try but would be surprised if the performance is much different from CUDA 5.5 for the cases above.

Here is a benchmark brief stating SGEMM and DGEMM performance for the K20X, the numbers seem to jibe with the performance I am measuring on the K20c (which is not quite as powerful as the K20X):

[url]Page Not Found | NVIDIA

Skybuck · July 27, 2013, 7:53am

Right now we are experiencing a MAJOR HEATWAVE in The Netherlands, probably the worst one I have experienced ever. I suspect hot air from middle east and hot air from north america via sea currents. So two factors coming together forming a big heatwave.

It’s now clear that the Antec 1200 case with 3 inlets and 3 outlet fans + 1 from power supply, cannot cool an 85 watts AMD X2 3800+ (DUAL CORE) running at 2.0 GHZ per core. It can run at 2.0 GHZ but then it would fry the motherboard. The Winfast motherboard has temperature sensors and will shutdown the entire computer if temperature goes over 50 degrees celcius. Outside and Inside my appartment it’s now 28.5 degrees celcius according to my shitty clock on my desktop. I really have to go buy a better temperature meter lol… then again… the weather forecast more or less say same thing… real temperature might be 30 degrees or so. I might call my mom later too to verify but she not in the city where it’s a few degrees hotter but ok.

Back to the story… I had to under clock the CPU and I also decided to underclock the GPU just to be on the safe side. Only then could system temperature remain far below 50 degrees celcius.
Right now it balances around 40 degrees celcius. The GPU still goes hot to 52 degrees or so… and this is a GT 520… passively cooled, lowest watt dx 11 gpu probably… something like 30 watts or so ? and even this thing runs hot ?! weird.

Anyway… I don’t believe for a second that the Titan will run at 300 watts continously… and even if it did I do not believe that it can be cooled properly by any air cooling case under these heatwave conditions.

So at one point one must call BS on the whole thing… NVIDIA might as well create a 10.000 WATTS GPU and claim that it’s so much faster than the CPU… but in my mind… it plays no roll if it cannot be cooled properly.

A slight point of critique too: The haswell processor is out already… I guess the CUDA C programming guide was released before that processor came out… but it’s the latest from intel as well and probably a better candidate to compare against the TITAN.

And I shall end this post with a funny note:

“A dead Titan is NO GOOD TITAN”

and another one:

“A fried Titan is NO GOOD TITAN”

and a last one:

“A fried PC makes a BAD TITAN” :)

(Great… it’s just started raining and a little bit of thunderstorm… looks like it’s gonna be a good long rain shower… temperature already down 1.5 degrees. Quite an odd experience… feeling my cheeks burn from heat… while cold air blowing against it… the air is still hot… :) but getting cooler ;) wind s picking up… good thing too… )

Skybuck · July 27, 2013, 7:54am

Woops stilling getting used to this new forum… pressed wrong button, quote instead of edit to correct a typo.

sBc-Random · July 27, 2013, 8:20am

Obvious troll.
If not trolling:

I’ve had 8 gpus running the same program in the one box, continuously, for 48 hours.

Skybuck · July 27, 2013, 9:07am

You are an obvious troll if you had 8 titans running in one box, your house would be on fire lol.

pasoleatis · July 27, 2013, 9:34am

I am running fft code on a Tesla k20 with 10000x10000 matrix using almost all 5GB of the gpu ram. the power consumption shown by nvidia-smi is 137 W. I do not think there is any real (useful) cuda program which would make the power consumption at max (a.k.a use the gpu 100%).

Please tell me more about your set-up. I want to build a cuda computing server for our lab.

sBc-Random · July 27, 2013, 10:21am

Will send a pic tomorrow if you really want. It’s actually 7 titans and one K20X. Runs at about 40C

nnunn · July 31, 2013, 8:57pm

Yes, pics please!! (And machine specs too, if possible)

sBc-Random · August 8, 2013, 2:29am

Hey sorry was away. I’m just about to swap the K20X out for a titan (titan actually performs better for my current code). Will upload a pic when it’s all nice and implemented :)

It’s not cheap, 8 gpu motherboards are hard to come by. Further, I needed to maximize P2P transfer rates so it’s using Romley Arch. This is the tyan barebones (motherboard, case, fans, psu) that it’s currently using:

PLENTY of cooling (when it turns on it’s about 100db, idles at about 60 db!!!). Was a bit difficult to set up, and there were definitely a few teething problems (needs 3 power sockets, ideally from 3 different power boards, must use the proprietary onboard graphics, we were unable to get output from any of the Titans)

On top of that:
2* Intel Xeon E5-2620 2.00 Ghz 15MB Cache 7.20GT/sec LGA 2011 Six Core Processor
8* 4GB 240-Pin DDR3 SDRAM ECC Registered DDR3 1600 MHz Server Memory
1* 2TB 7200RPM 64MB CACHE 3.5IN SATA Enterprise Class HDD
8* GEFORCE GTX TITAN 6GB GDDR5 384-BIT PCIE3.0

CudaaduC · August 8, 2013, 3:33am

Drool…

Very nice, but why not SSD drive(s)? At least for the operating system?
It is amazing what a difference they make.

Any chance for a pic?

Topic		Replies	Views
GTX Titan speed and Boost 2.0 under Linux? Linux	12	11101	March 12, 2013
Titan RTX and Titan V CUDA Programming and Performance	18	12670	August 11, 2019
Modern GPU CUDA Programming and Performance	30	5660	April 11, 2016
Titan V slower than 1080ti tensorflow:18.08-py3 and 396.54 drivers Frameworks tensorflow	21	10342	October 12, 2021
GTX 590 CUDA power tests CUDA Programming and Performance	40	10100	January 29, 2012
Recommended setup for trial GPU computing CUDA Programming and Performance	9	1595	August 8, 2013
Getting Performance on Titan Legacy PGI Compilers	12	11809	December 27, 2016
Tesla K40 vs. Quadro M6000 vs. GeForce Titan X CUDA Programming and Performance	12	45307	April 7, 2015
Help me,please. GTX-Titan doesn't work well. CUDA Programming and Performance	20	4315	July 3, 2015
Could anyone benchmark this for me on a 780 (Ti) or Titan? CUDA Programming and Performance	57	20882	February 16, 2014

COMPARING A 300 WATT TITAN with a 85 WATT IVY BRIDGE PROCESSOR IS BS.

Related topics