I finally got the GPU version of HPL to run on my system, but the results are not what they should be. I followed the Howto - HPL on NVIDIA GPUs guide, but my results are not comparable. I’m using the following bash script to run HPL between 2 nodes, each having 24 processors and 4 GPUs:
mpirun --machinefile machinefile --x LD_LIBRARY_PATH -np 8 xhpl
My machine file just lists each node I’m trying to run on 4 times each. Is this how I’m supposed to run the program? Or do I need to change something in my runscript / mpirun command? Currently I’m getting about 500 GFLOPs as a total score between the two nodes when each M2070 GPU should be getting more than that alone.