I can’t seem to make any changes to the long running thread @dusty_nv, so I will place it here. I’ve followed all the advice, and can’t get things working. Need torchvision for a library.
Go to compile Torchvision and get this error:
OSError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory
This command sudo find / -name 'libmpi*'
gets me this:
/usr/lib/aarch64-linux-gnu/libmpi_mpifh.so.40.20.2
/usr/lib/aarch64-linux-gnu/libmpi_usempif08.so.40
/usr/lib/aarch64-linux-gnu/libmpi_usempif08.so
/usr/lib/aarch64-linux-gnu/libmpi.so.40.20.3
/usr/lib/aarch64-linux-gnu/libmpi_java.so
/usr/lib/aarch64-linux-gnu/libmpi_cxx.so.40.20.1
/usr/lib/aarch64-linux-gnu/libmpi_mpifh.so.40
/usr/lib/aarch64-linux-gnu/libmpi_mpifh.so
/usr/lib/aarch64-linux-gnu/libmpi_usempi_ignore_tkr.so.40.20.0
/usr/lib/aarch64-linux-gnu/libmpi_usempi_ignore_tkr.so.40
/usr/lib/aarch64-linux-gnu/libmpi_cxx.so
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_mpifh.so.40.20.2
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_usempif08.so
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so.40.20.3
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_java.so
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so.40.20.1
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_mpifh.so
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_usempi_ignore_tkr.so.40.20.0
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_java.so.40.20.0
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_usempif08.so.40.21.0
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_usempi_ignore_tkr.so
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so
/usr/lib/aarch64-linux-gnu/libmpi.so.40
/usr/lib/aarch64-linux-gnu/libmpi_cxx.so.40
/usr/lib/aarch64-linux-gnu/libmpi_java.so.40.20.0
/usr/lib/aarch64-linux-gnu/libmpi_java.so.40
/usr/lib/aarch64-linux-gnu/libmpi++.so
/usr/lib/aarch64-linux-gnu/libmpi_usempif08.so.40.21.0
/usr/lib/aarch64-linux-gnu/libmpi_usempi_ignore_tkr.so
/usr/lib/aarch64-linux-gnu/libmpi.so
/etc/alternatives/libmpi++.so-aarch64-linux-gnu
/etc/alternatives/libmpi.so-aarch64-linux-gnu
I added this to my bash:
export LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/openmpi/lib:$LD_LIBRARY_PATH
I tried adding symlinks for libmpi_cxx.20
to these files:
usr/lib/aarch64-linux-gnu/libmpi_cxx.so.40
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so.40.20.1
I wiped out the openmpi and dev installs as you mentioned here:
sudo apt-get purge -y libopenmpi-dev libopenmpi* openmpi-bin && \
sudo apt-get install -y libopenmpi-dev openmpi-bin
I’m using this torch install: torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
On torchvision, here is my branch:
* release/0.16
I also tried downloading the v0.16.1
tagged branch, and got the same results.
Any ideas?