I am using JP 4.6 on a few Xaviers NX that make a small cluster and noticed the current OpenMPI installed is not cuda-aware capable, thus needing recompilation.
Maybe it could be considered for future JetPack releases?
======== UPDATE ========
I was trying to compile OpenMPI with cuda-aware support by following the documentation: OpenMPI Build CUDA, however, GDRCopy is not meant to be used with Tegra according to this post: Github GDRCopy.
Then Mat Colgrove mentions that UCX is not necessary for cuda-aware OMPI to work, see this SO post.
You can see in OMPI doc that in order to build UCX, it already has to point to GDRCopy when configuring (./configure --prefix=/path/to/ucx-cuda-install --with-cuda=/usr/local/cuda --with-gdrcopy=/usr). Since it will not compile on Tegra, I assume it can be omitted.
When recompiling OpenMPI, should I stick to the default 2.1 version (flagged as retired in OpenMPI page) that comes with Ubuntu 18.04 in JetPack 4.6 or it is ok to go to 4.1? If you have any suggestion, feel free to comment.
======= UPDATE 2 ========
I managed to compile ucx version 1.11 (1.6 as suggested by the above link is a no-go) and then OpenMPI 2.1.1 from the tarballs, both with cuda support.
When compiling and running the compile-time and run-time checker program from cuda-aware support, it outputs that it is cuda-aware for compile-time, but not for run-time.
Checking the mpi-ext.h
header that it needs (which was installed in another directory by the OMPI compilation, so I had to fix some symlinks for mpicc to find it), it seems to be the macro MPIX_CUDA_AWARE_SUPPORT
defined with value 1 in the file mpiext_cuda_c.h
that the program checks for compile-time (the Jetpack 4.6 factory version has value 0), but the function MPIX_Query_cuda_support()
doesn’t return 1, thus failing for run-time cuda-awareness (which I believe is what is needed).
If anyone had luck with cuda-awareness with Tegra, let me know.