I have problem using CUDA HPL.
I always got nan failed when my N or NB is large.
The detailed specification of my machines is
nodes : 5
OS : Centos 6.3
CPU : Intel Xeon E5-2670*2 per node
GPU : M2090
Mem : 96GB per node
Infiniband : Mellanox QDR
And the software I use
MPI : MVAPICH2-1.7 ( I also use openmpi-1.4.5 and Intel MPI 4.1 but failed too.)
BLAS : Intel MKL 11.0
CUDA HPL : hpl-2.0_FERMI_v13
Compiler : Intel compiler
================================================================================
N : 140000
NB : 768 896 1024 1152 1280
PMAP : Row-major process mapping
P : 1
Q : 3
PFACT : Right
NBMIN : 8
NDIV : 2
RFACT : Right
BCAST : 2ringM
DEPTH : 1
SWAP : Mix (threshold = 128)
L1 : no-transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
================================================================================
T/V N NB P Q Time Gflops
WR13R2R8 140000 768 1 3 1328.07 1.377e+03
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0034287 … PASSED
T/V N NB P Q Time Gflops
WR13R2R8 140000 896 1 3 1282.66 1.426e+03
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0036567 … PASSED
T/V N NB P Q Time Gflops
WR13R2R8 140000 1024 1 3 1130.56 1.618e+03
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= nan … FAILED
T/V N NB P Q Time Gflops
WR13R2R8 140000 1152 1 3 1124.15 1.627e+03
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= nan … FAILED
T/V N NB P Q Time Gflops
WR13R2R8 140000 1280 1 3 1127.96 1.622e+03
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= nan … FAILED
Finished 5 tests with the following results:
2 tests completed and passed residual checks,
3 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
It seems that NB below 1024 is ok for N is 140000.
But when I use 5 nodes and set N to 230000 NB to 768, it failed too.
Then I go on testing and find out when N is 230000, only NB is below 512 will passed.
It makes me crazy!!
Can somebody tell me what’s going on.