MPI compile include path error with nvidia container runtime

nvidia container tookit version 1.16,1

dev@dev_env:~$ nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.16.1
commit: a470818ba7d9166be282cd0039dd2fc9b0a34d73

When I launch Isaac ros container without nvidia container runtime, the mpicc compile info is:

dev@dev_env:~$ docker run -it nvcr.io/nvidia/isaac/ros:x86_64-ros2_humble_45d368cdbbe4a484643464d0d492c764 /bin/bash

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.10 (build 72127154)
Triton Server Version 2.39.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

root@2681ac82d08a:/opt/tritonserver# mpicc -showme:compile
-I/opt/hpcx/ompi/include -I/opt/hpcx/ompi/include/openmpi -I/opt/hpcx/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -I/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -I/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include

And this is the correct include path in the docker:

root@2681ac82d08a:/opt/tritonserver# ll /opt/hpcx/ompi
total 28
drwxr-xr-x  7 root root 4096 Oct  4  2023 ./
drwxr-xr-x 10 root root 4096 Oct  4  2023 ../
drwxr-xr-x  2 root root 4096 Oct  4  2023 bin/
drwxr-xr-x  2 root root 4096 Oct  4  2023 etc/
drwxr-xr-x  6 root root 4096 Oct  4  2023 include/
drwxr-xr-x  5 root root 4096 Oct  4  2023 lib/
drwxr-xr-x  5 root root 4096 Oct  4  2023 share/

But when I launch the docker with nvidia container runtime:

dev@dev_env:~$ docker run -it --runtime nvidia nvcr.io/nvidia/isaac/ros:x86_64-ros2_humble_45d368cdbbe4a484643464d0d492c764 /bin/bash

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.10 (build 72127154)
Triton Server Version 2.39.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

root@f34d1bfc4890:/opt/tritonserver# mpicc -showme:compile
-I/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include -I/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include/openmpi -I/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -I/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent -I/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include

the path mpicc try to include is wrong. Each previous path is appended with extra /lib/x86_64-linux-gnu/openmp

This will fail a simple cmake task as follows:

cmake_minimum_required(VERSION 3.11)
project(foo)
find_package(MPI QUIET)

with the following error:

root@608c7b4ae17c:/test# cmake .
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error in /test/CMakeFiles/CMakeTmp/CMakeLists.txt:
  Imported target "MPI::MPI_C" includes non-existent path

    "/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include"

  in its INTERFACE_INCLUDE_DIRECTORIES.  Possible reasons include:

  * The path was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and references files it does not
  provide.



CMake Error in /test/CMakeFiles/CMakeTmp/CMakeLists.txt:
  Imported target "MPI::MPI_C" includes non-existent path

    "/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include"

  in its INTERFACE_INCLUDE_DIRECTORIES.  Possible reasons include:

  * The path was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and references files it does not
  provide.



CMake Error at /usr/share/cmake-3.22/Modules/FindMPI.cmake:1264 (try_compile):
  Failed to generate test project build system.
Call Stack (most recent call first):
  /usr/share/cmake-3.22/Modules/FindMPI.cmake:1315 (_MPI_try_staged_settings)
  /usr/share/cmake-3.22/Modules/FindMPI.cmake:1638 (_MPI_check_lang_works)
  CMakeLists.txt:3 (find_package)


-- Configuring incomplete, errors occurred!
See also "/test/CMakeFiles/CMakeOutput.log".
root@608c7b4ae17c:/test# 

I am wondering how nvidia docker runtime alter the path and how to fix this.

Hi @heiscsy, I am facing the same issue. Did you find any solution for this?

Hi @jasmeet1

I think we have to rely on nvidia to resolve this in a fundamental way, But in the meantime I found two way to circumvent it.

  1. you can create a symbol link of the mpi libraries to the wrongly set path:
    I use the released docker as a base docker, add two more lines to create a new dockerfile and build a docker
FROM nvcr.io/nvidia/isaac/ros:x86_64-ros2_humble_45d368cdbbe4a484643464d0d492c764 as base

RUN mkdir -p /opt/hpcx/ompi/lib/x86_64-linux-gnu
RUN ln -s /opt/hpcx/ompi /opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi
  1. I found the include path will correct itself when the libc-bin package is force reconfigured.
sudo dpkg-reconfigure libc-bin

you can manually trigger this, or add it in the build scripts before build.

Best
Yuhan