nvidia-provided LINPACK benchmarking software

I downloaded LINPACK benchmark software from
https://developer.nvidia.com/rdp/assets/cuda-accelerated-linpack-linux64

and tried to run on our Tesla-P100 machine (3GPUs on board)

I can successfully run the benchmark software in single process. But when I use mpirun -np 2 ./run_linpack it blocks (even for -np 1) I found one process takes 100% of CPU and the other is almost idle. I installed openmpi on ubuntu16.04.

Maybe I missed something. I appreciate your suggestions.