Hi, I am using Nvidia HPL to do some benchmark. I works well with smaller problem size (N) like 9600 or 15000, while the GPU freezes if I use larger problem size like 20000. The estimated memory size with N=20000 is 20000 * 20000 * 8 = 2.98GB which is smaller than main memory size (16GB) and GPU memory size (6GB).
Environment
- Intel(R) Xeon(R) CPU E5506 x 2
- Tesla C2070 x 4
- Ubuntu 14.04
- Cuda 7.5
The nvidia-smi command also froze after running HPL.
HPL.dat file
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
20000 Ns
1 # of NBs
1024 768 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
1 Ps
3 Qs
16.0 threshold
1 # of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
2 8 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
0 1 2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
0 2 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 0 DEPTHs (>=0)
1 SWAP (0=bin-exch,1=long,2=mix)
192 swapping threshold
1 L1 in (0=transposed,1=no-transposed) form
1 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
dmesg info
[ 1423.979936] [<ffffffffc15fa429>] nv_map_dma_map_scatterlist+0x99/0x110 [nvidia]
[ 1423.980023] [<ffffffffc15fa8ef>] nv_dma_map_pages+0x19f/0x380 [nvidia]
[ 1423.980110] [<ffffffffc15ff91b>] ? os_mem_set+0x1b/0x30 [nvidia]
[ 1423.980196] [<ffffffffc15fac70>] nv_dma_map_alloc+0xd0/0x200 [nvidia]
[ 1423.980282] [<ffffffffc15e308f>] _nv012676rm+0x1ff/0x2b0 [nvidia]
[ 1423.980373] [<ffffffffc15ad0e9>] ? _nv010785rm+0x879/0x9b0 [nvidia]
[ 1423.980519] [<ffffffffc140bb45>] ? _nv011383rm+0x155/0x220 [nvidia]
[ 1423.980605] [<ffffffffc15f0e51>] ? _nv012569rm+0x231/0x5c0 [nvidia]
[ 1423.980690] [<ffffffffc15f0ccc>] ? _nv012569rm+0xac/0x5c0 [nvidia]
[ 1423.980775] [<ffffffffc15f0c74>] ? _nv012569rm+0x54/0x5c0 [nvidia]
[ 1423.980869] [<ffffffffc15984f8>] ? _nv000804rm+0x3128/0x37a0 [nvidia]
[ 1423.980963] [<ffffffffc15984b3>] ? _nv000804rm+0x30e3/0x37a0 [nvidia]
[ 1423.981056] [<ffffffffc1595513>] ? _nv000804rm+0x143/0x37a0 [nvidia]
[ 1423.981153] [<ffffffffc1571f74>] ? _nv002284rm+0xbb4/0x3990 [nvidia]
[ 1423.981240] [<ffffffffc15dd811>] ? _nv000719rm+0x281/0x440 [nvidia]
[ 1423.981327] [<ffffffffc15dd997>] ? _nv000719rm+0x407/0x440 [nvidia]
[ 1423.981413] [<ffffffffc15de0f8>] ? _nv000696rm+0x728/0x7d0 [nvidia]
[ 1423.981498] [<ffffffffc15e8153>] ? rm_ioctl+0x73/0x100 [nvidia]
[ 1423.981590] [<ffffffffc15f650d>] ? nvidia_ioctl+0x13d/0x430 [nvidia]
[ 1423.981675] [<ffffffffc15f4cff>] ? nvidia_frontend_ioctl+0x2f/0x70 [nvidia]
[ 1423.981761] [<ffffffffc15f4d5d>] ? nvidia_frontend_unlocked_ioctl+0x1d/0x30 [nvidia]
Is there anyone having an idea what the problem is?
Thanks,
– LZ