benchmarking GPUs

miahw · September 9, 2008, 2:12pm

Is there a program available that measures the GFLOP rating of Nvidia GPUs? I’ve downloaded SDK 2.0 but can’t find a program that does this.

Thanks in advance. :ph34r:

Simon_Green · September 9, 2008, 3:32pm

I posted a simple gflops test here a while ago:
[url=“http://forums.nvidia.com/index.php?showtopic=45752”]http://forums.nvidia.com/index.php?showtopic=45752[/url]

Your mileage may vary!

miahw · September 9, 2008, 4:32pm

Thanks! I’ll have a look at it. Much appreciated. :ph34r:

miahw · September 10, 2008, 10:45am

Got it working. Ran it on the Tesla C870, and get 319 GFLOP (benchmarking only one GPU). The theoretical GFLOP is 430, so I suppose that’s not too bad.

Cheers External Media

alex_dubinsky · September 10, 2008, 6:11pm

Keep in mind that while the word “GFLOP” is used often, there are precisely two things that are measured by it and talked about. Theoretical performance and LU matrix decomposition (which is mostly matrix-multiply, but with a bunch of data shuffling) aka Linpack. You can try to apply the measure to any sort of code, but it’s not usually a meaningful exercise. Now on the one hand Linpack is only marginally relevant at answering the question of “what’s, really, the achievable performance on a real app” but on the other, an artificial MADDing-registers-endlessly code answers this question even less effectively (although the exercise has its uses).

SO, in summation, the GPU can’t do 319 GLFOP on the Linpack (or on even a single-precision matrix-multiply) and thus you can’t say that it’s the “sustained” performance and advertise it as such to other HPC people. The figure is, really, an “adjusted theoretical” that reveals the fact that NVIDIA has been lying and counting hardware capability that even the most ideal code can’t access.

paulius · September 10, 2008, 6:45pm

The numbers quoted in Tesla marketing materials are not “lies” but theoretical peaks. Same thing is done for other processors as well (for example, 102Gflops for Intel x5482 Harpertown Xeon). It’s just the peak issue rate.

Claiming that Linpack (or matrix mul) is the true measure of sustained Gflops rate is not prudent. Noone really cares about GFlops number for apps other than theirs, and Gflops for different apps varies extremely widely.

Paulius

alex_dubinsky · September 11, 2008, 6:42am

I don’t think anyone measures the gflops for their apps. Why would someone do that? When optimizing an application, if can you cut a million operations out of your logic, then you won’t care if the average ops per second goes down (eg because instructions-per-memory-accesses ratio decreased).

GFLOPs is only measured for linpack, if only it’s because it’s a convenient reference point and there’s an organization that collects results from most of the powerful architectures. What’s interesting is that while the linpack (primarily, matrix-multiply) is not a complex algorithm, it’s nuanced enough to reflect on the design of the underlying architecture, on the difficulty of optimizing for it, and on the time that’s gone into optimizing its libraries. Wouldn’t someone impartial say that maybe those reasons are being felt when benchmarking CUBLAS performance?

Re: the un-measurable GFLOPS: What’s the point of retredging ancient history? But if we must, NVIDIA quoted ops which simply could not be used in CUDA, either via Cu or PTX. Maybe you’re saying that NVIDIA is allowed to quote those because a few shaders for several high-profile games that its internal team rewrote in assembly had used them? What you guys pulled on the G80 is nothing at all what Intel does with Xeons. Don’t know why you’d even say that. The Xeon’s 102 is just its four SSE units times 3.2 GHz times four cores times two ops per MAD. Intel doesn’t even try to count the old x87 unit (which still works btw). A trivial instruction pump could hit all of them and the fp co-processor too. That’s why I said the code in the other thread is basically the true theoretical figure.

I should also say that the GT200 marketing doesn’t try to pull this stuff anymore. I should also say that the G80 episode was much less egregious than when Sony boldly announced its PS3 could do 2 teraflops on account of texture interpolation. Then again, that GPU was made by NVIDIA also :P

paulius · September 11, 2008, 5:40pm

Not sure to what you are referring. All I said in my post that the numbers you’re unhappy with are purely theoretical peaks - hardware issue rate. Applications do not reach the hardware issue rate on any architecture. So, I’d say it’s fair to compare theoretical peak with theoretical peak or sustained app rate vs sustained app rate on different architectures. I don’t recall us comparing our theoretical peak vs another arch’s app rate.

Paulius

cbuchner1 · September 11, 2008, 9:22pm

Couldn’t one just interleave linearly interpolated texture lookups with a barrage of multiply/add commands to get close to the theoretical peak? Even if the computation is not necessarily meaningful, it would have the advertised FLOPS.

alex_dubinsky · September 12, 2008, 2:18am

Texture interpolation is never counted as flops, even by NVIDIA, because it is not programmable. What was being counted was something else, something that was even less accessible from CUDA.

The thing is, even when counting theoretical flops, people typically use judgement because even this ‘theoretical’ number is supposed to mean something relevant. paulius, it’s not just an opportunity to put the nicest figure you can find between the couch cushions up on a slide.

I guess what I’m saying is that your theoretical peak can’t be fairly compared to Intel’s theoretical peak. Does that make sense?

Topic		Replies	Views
Where do all the little FLOPS come from? still dont understand the spec CUDA Programming and Performance	8	18570	February 23, 2007
gigaflops CUDA Programming and Performance	16	16398	September 11, 2008
Chart GPU vs CPU CUDA Programming and Performance	11	13753	October 15, 2008
Comparing CPU and GPU Theoretical GFLOPS CUDA Programming and Performance	14	29443	May 24, 2014
what is the double-precision flops rating of the gtx580? CUDA Programming and Performance	16	33443	April 10, 2014
Question about computing GFLOPS Do fabs and a=-b instructions count? CUDA Programming and Performance	13	4468	February 12, 2010
HPLinpack for CUDA Any interest? CUDA Programming and Performance	27	11950	May 10, 2012
Mythical Tflops CUDA Programming and Performance	11	1087	January 14, 2019
GPU Perfomance How much GFlops??? CUDA Programming and Performance	27	37239	August 30, 2009
some detail-questions for a bachelor-thesis CUDA Programming and Performance	5	10411	December 4, 2010

benchmarking GPUs

Related topics