CUDA Cluster - HPL Help

kms3458 · September 23, 2013, 3:51am

Hello everyone,

So I am currently taking part in an academic research project dealing with HPC. My team has constructed a cluster consisting of two nodes and the following spec:

Node (x2)
8-core AMD CPU
3 x GTX 660 Ti
2 x 8GB RAM

I have successfully acquired and built the HPL that Nvidia provides as part of their developer program, but I am facing some issues getting it to run correctly.

When I execute the following command:

mpiexec -np 6 -hostfile hostfile ./run_linpack

I get the first part of what you would normally see with the HPL:

================================================================================
TestingHPLinpack 2.0  --  High-Performance Linpack benchmark  --   September 10, 2008
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   25000 
NB     :     768 
PMAP   : Row-major process mapping
P      :       2 
Q      :       2 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :    Left 
BCAST  :   1ring 
DEPTH  :       1 
SWAP   : Spread-roll (long)
L1     : no-transposed form
U      : no-transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

At this point, however, nothing new ever appears. Eventually I get a “Bad Pipe” error message, but I assume that is from my SSH session, and not the HPL.

I can verify that 6 instances of the xhpl executable are indeed running (three per node), and that they are consuming 100% of a CPU core each.

Here is my software stack:
Ubuntu Server 12.04.3
CUDA version 5.5
ACML 5.3.1
GCC 4.6.3
MPICH2 3.0.4 (I think that is the version)
hpl-2.0_FERMI_v15

Any information or help would be greatly appreciated! Please let me know if I can provide you with any additional information.

Thanks,
Kelly

mebersole · October 3, 2013, 2:26pm

Kelly,

The number of MPI processes should equal the number of GPUs and this should equal PxQ. Looks like you have PxQ=2x2. I’d try changing the PxQ to either 2X3 or 3x2 in the HPL.dat.

Let me know if that works or not!

~Mark

Topic		Replies	Views
HPL CUDA Programming and Performance	11	42500	July 18, 2011
Settings for HPL CUDA Programming and Performance	7	4469	February 13, 2012
HPL and Tesla C1060: Not Enough GPUs problem Problem when running HPL. CUDA Programming and Performance	6	1845	September 15, 2011
ERROR: Not enough GPUs on node CUDA Programming and Performance	1	768	September 13, 2017
HPL CUDA don't use GPGPU CUDA Programming and Performance	1	1159	September 26, 2014
hpl-2.0_cuda tuning CUDA Programming and Performance	2	10582	April 1, 2012
Running Fermi-HPL (not using GPUs) Fermi-HPL benchmark not using Gpus CUDA Programming and Performance	5	2793	April 23, 2012
how to compute the theoretical value of nvidia hpl CUDA Programming and Performance	0	2283	January 6, 2012
MPI error while running HPL CUDA Programming and Performance	0	1225	July 20, 2021
Optimizing High Performance Linpack CUDA Programming and Performance	0	685	October 25, 2013

CUDA Cluster - HPL Help

Related topics