peak computational throughput

Manalo · December 24, 2015, 11:00am

Dears
How can I know the peak computational throughput (GFLOP/s) for my installed GPU (GeForce GT 740m) ??

Robert_Crovella · December 24, 2015, 2:32pm

Essentially all of the data you need to calculate that can be gotten from deviceQuery. I suggest posting the deviceQuery output from your GT 740m

scottgray · December 24, 2015, 7:44pm

Using the FFMA instruction a single cuda core can compute 2 floating point operations in 1 clock cycle. 1 for multiply and 1 for add. So peak FLOPS is just the number of cuda cores times the clock frequency times 2. On TitanX you have:

3072 * 2 * 10**6 = 6.144 TFLOPS

The boost clock will let you compute more than that, but it typically can’t be sustained because of power or heat constraints.

I belive the 740m is a Kepler part with 384 cores at 810 MHz. That’s 622 GLOPS. On kepler some of the cuda cores are shared between schedulers and in practice utilizing more than ~70% of them at any one time is not possible. So you should see sgemm benchmarks run at around 430 GFLOPS. The sgemm implementation in cublas can run FFMA’s just about as fast as the cores can process them (provided matrix dimensions are big enough).

Robert_Crovella · December 24, 2015, 10:06pm

And there is a corresponding divisor if you are referring to double-precision GFLOPS, which varies by GPU. On Titan X the divisor is 32 (divide 6.144 by 32) and on 740m if it is a Kepler part it should have a divisor of 24 (622/24 = ~25.9 DP GFLOPS)

Topic		Replies	Views
Calculatin FLOPS of GPU CUDA Programming and Performance	2	19033	February 10, 2017
Achieving Peak Compute Performance on Kepler CUDA Programming and Performance	3	1372	March 18, 2014
Peak Performance Computation CUDA Programming and Performance	4	2943	June 13, 2012
Is there any official benchmark tool to test a GPU's FLOPS? GPU-Accelerated Libraries cublas , cutlass	3	5231	October 24, 2023
flops calculation by profiler / of maximum CUDA Programming and Performance	6	14272	August 7, 2008
About GPU peak performance CUDA Programming and Performance	6	1519	August 29, 2023
Calculation of the maximum performance of the Tesla K40 GPU (general discussion) CUDA Programming and Performance	3	3705	January 15, 2015
[Matrix Multiplication] GFlops on Nvidia Quadro FX 1700.... CUDA Programming and Performance	5	7763	April 16, 2010
How do you measure the GFLOPS for your kernel? CUDA Programming and Performance	0	907	September 13, 2010
Unable to reach full fp32 throughput on Titan X CUDA Programming and Performance	14	1622	December 10, 2016

peak computational throughput

Related topics