You don’t need to take too much of a guess. Just read this.
It gives some pretty indicative numbers for GPU and CPU+GPU HPL, which is effectively what you are interested in (I am presuming you are talking full rather than sparse factorization).
Has anyone played with converting Volkov’s code to doubles?
Ben
(Edit: Not meaning to be rude and ignore your comments.
avidday: That very well may be what we end up using. Just checking around.
figual: Ah, I see. Good to know the performance picks up pretty quickly around the Ns we’re considering)
Modifying Vasily’s code, it is probably the best option.
These are the results from HPL.
mpirun -np 1 ./run_linpack
============================================================
====================
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
============================================================
====================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 2000
NB : 1280 1152 960 896 768 640 512 384
256 128
PMAP : Row-major process mapping
P : 1
Q : 1
PFACT : Left
NBMIN : 2
NDIV : 2
RFACT : Left
BCAST : 1ring
DEPTH : 1
SWAP : Mix (threshold = 256)
L1 : no-transposed form
U : no-transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
Assigning device 0 to process on node rank 0
DTRSM split from environment variable 0.520000
DGEMM split from environment variable 0.655000
============================================================
====================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 2000 1280 1 1 0.44 1.223e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0077577 ...... PASSED
============================================================
====================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 2000 1152 1 1 0.40 1.345e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0067703 ...... PASSED
============================================================
====================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 2000 960 1 1 0.30 1.755e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0061134 ...... PASSED
============================================================
====================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 2000 896 1 1 0.31 1.746e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0075184 ...... PASSED
============================================================
====================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 2000 768 1 1 0.28 1.889e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0071333 ...... PASSED
============================================================
====================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 2000 640 1 1 0.27 1.942e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0061972 ...... PASSED
============================================================
====================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 2000 512 1 1 0.26 2.080e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0056605 ...... PASSED
============================================================
====================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 2000 384 1 1 0.26 2.074e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0069768 ...... PASSED
============================================================
====================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 2000 256 1 1 0.26 2.049e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0073184 ...... PASSED
============================================================
====================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 2000 128 1 1 0.32 1.680e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0053558 ...... PASSED
============================================================
====================
Finished 10 tests with the following results:
10 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
End of Tests.
============================================================
====================