HPL on Jetson TK1

I got the hpl-2.0_FERMI_v15 and I’m trying to make it work on my Jetson TK1. It doesn’t look like it was made to be cross-compiled. Has anyone done this already and have some pointers? Should I try to compile it on the Jetson rather than cross-compiling it?

I know nothing about the specific device, but native compile directly on TK1 is usually preferable unless there is some specific need from another host. Cross-compile environments have lots of little things that can get in the way. Many embedded environments are not fully capable as build environments; the TK1 is an exception as it is more like a full desktop system than an embedded system.

Ok, I’m trying to compile directly on the Jetson, but I’m having a problem with OpenMPI (which it says it needs). One site with a walk-through of the HPL on a Dell (http://www.shanetarleton.com/linpack-with-cuda-install-guide/) claims that they had a problem with versions other than 1.4.5, but when I try to configure that version on the Jetson, I get the error “No atomic primitives available for armv7l-unknown-linux-gnueabihf” and it won’t configure. Has anyone gotten either the HPL or OpenMPI to work on the Jetson TK1?

You’ll want to make sure that your OpenMPI is CUDA-aware, here’s a page that describes some things to be aware of: https://www.open-mpi.org/faq/?category=runcuda

I would think that these instructions should work (replacing GotoBLAS with cuBLAS): http://www.shodor.org/media/content/petascale/materials/Tools/HPL/HPL_Lab_Exercise/HPL_Build_Instructions.pdf, but I haven’t tried it out myself yet. There’s also these instructions for the Raspberry Pi, but again it’ll need to be hooked up to the CUDA MPI/BLAS equivalents: https://www.howtoforge.com/tutorial/hpl-high-performance-linpack-benchmark-raspberry-pi/. There’s also a CUDA Fortran which should probably be used in place of gfortran, but the tutorials don’t seem to have a requirement of wiring that up in the configuration.

I am trying to get HPL 2.1 to work on the Jetson TK1 as well, I have openMPI configured and running with CUDA support. (you need openMPI 1.8 or later, I used the latest 1.10.2).

./configure --prefix=/usr/local --with-cuda --enable-mpi-java

This assumes you have cuda installed on the Jetsons.

I have gotten both openMPI 1.10.2 compiled and running with cuda, as well as HPL 2.1 configured compiled and running on a cluster of Jetsons.

I will try cstotts suggestions.

UPDATE: I attempted to use cuBLAS in place of BLAS, it does not work, I read that it is not a complete rewrite of BLAS, and some wrapper is needed.