nvidia container tookit version 1.16,1
dev@dev_env:~$ nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.16.1
commit: a470818ba7d9166be282cd0039dd2fc9b0a34d73
When I launch Isaac ros container without nvidia container runtime, the mpicc compile info is:
dev@dev_env:~$ docker run -it nvcr.io/nvidia/isaac/ros:x86_64-ros2_humble_45d368cdbbe4a484643464d0d492c764 /bin/bash
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 23.10 (build 72127154)
Triton Server Version 2.39.0
Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
root@2681ac82d08a:/opt/tritonserver# mpicc -showme:compile
-I/opt/hpcx/ompi/include -I/opt/hpcx/ompi/include/openmpi -I/opt/hpcx/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -I/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -I/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include
And this is the correct include path in the docker:
root@2681ac82d08a:/opt/tritonserver# ll /opt/hpcx/ompi
total 28
drwxr-xr-x 7 root root 4096 Oct 4 2023 ./
drwxr-xr-x 10 root root 4096 Oct 4 2023 ../
drwxr-xr-x 2 root root 4096 Oct 4 2023 bin/
drwxr-xr-x 2 root root 4096 Oct 4 2023 etc/
drwxr-xr-x 6 root root 4096 Oct 4 2023 include/
drwxr-xr-x 5 root root 4096 Oct 4 2023 lib/
drwxr-xr-x 5 root root 4096 Oct 4 2023 share/
But when I launch the docker with nvidia container runtime:
dev@dev_env:~$ docker run -it --runtime nvidia nvcr.io/nvidia/isaac/ros:x86_64-ros2_humble_45d368cdbbe4a484643464d0d492c764 /bin/bash
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 23.10 (build 72127154)
Triton Server Version 2.39.0
Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
root@f34d1bfc4890:/opt/tritonserver# mpicc -showme:compile
-I/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include -I/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include/openmpi -I/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -I/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent -I/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include
the path mpicc try to include is wrong. Each previous path is appended with extra /lib/x86_64-linux-gnu/openmp
This will fail a simple cmake task as follows:
cmake_minimum_required(VERSION 3.11)
project(foo)
find_package(MPI QUIET)
with the following error:
root@608c7b4ae17c:/test# cmake .
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error in /test/CMakeFiles/CMakeTmp/CMakeLists.txt:
Imported target "MPI::MPI_C" includes non-existent path
"/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include"
in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:
* The path was deleted, renamed, or moved to another location.
* An install or uninstall procedure did not complete successfully.
* The installation package was faulty and references files it does not
provide.
CMake Error in /test/CMakeFiles/CMakeTmp/CMakeLists.txt:
Imported target "MPI::MPI_C" includes non-existent path
"/opt/hpcx/ompi/lib/x86_64-linux-gnu/openmpi/include"
in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:
* The path was deleted, renamed, or moved to another location.
* An install or uninstall procedure did not complete successfully.
* The installation package was faulty and references files it does not
provide.
CMake Error at /usr/share/cmake-3.22/Modules/FindMPI.cmake:1264 (try_compile):
Failed to generate test project build system.
Call Stack (most recent call first):
/usr/share/cmake-3.22/Modules/FindMPI.cmake:1315 (_MPI_try_staged_settings)
/usr/share/cmake-3.22/Modules/FindMPI.cmake:1638 (_MPI_check_lang_works)
CMakeLists.txt:3 (find_package)
-- Configuring incomplete, errors occurred!
See also "/test/CMakeFiles/CMakeOutput.log".
root@608c7b4ae17c:/test#
I am wondering how nvidia docker runtime alter the path and how to fix this.