When upgrade from NHPC 23.7 to NVHPC 24.5 the compilation became broken

When upgrade from NHPC 23.7 to NVHPC 24.5 the compilation became broken I get the following compilation errors:

[8/52] Building CXX object hpc/unit_testing/CMakeFiles/NcclMpiTest.dir/NcclMpiTest.cpp.o
FAILED: hpc/unit_testing/CMakeFiles/NcclMpiTest.dir/NcclMpiTest.cpp.o 
/bin/c++ -DGIT_COMMIT_SUBJECT="\"Integrate with gcc11.4 without cuda\"" -DGIT_DATE="\"Mon Jun 10 14:17:46 2024\"" -DGIT_SHA1=\"9be6c8e\" -DGIT_TAG=\"X1000V2.0-149-g9be6c8e\" -I/usr/include/opencv4 -I/home/Yehonatans/work/utils/hpc -I/home/Yehonatans/work/utils/general -I/home/Yehonatans/work/utils/threads -I/home/Yehonatans/work/utils/cuda -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/24.5/comm_libs/12.4/hpcx/hpcx-2.19/ompi/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/24.5/comm_libs/12.4/hpcx/hpcx-2.19/ompi/include/openmpi -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/24.5/comm_libs/12.4/hpcx/hpcx-2.19/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/24.5/comm_libs/12.4/hpcx/hpcx-2.19/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/24.5/comm_libs/12.4/hpcx/hpcx-2.19/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/24.5/cmake/../comm_libs/12.4/nccl/include -Wall -pedantic -std=gnu++17 -pthread -Werror -MD -MT hpc/unit_testing/CMakeFiles/NcclMpiTest.dir/NcclMpiTest.cpp.o -MF hpc/unit_testing/CMakeFiles/NcclMpiTest.dir/NcclMpiTest.cpp.o.d -o hpc/unit_testing/CMakeFiles/NcclMpiTest.dir/NcclMpiTest.cpp.o -c /home/Yehonatans/work/utils/hpc/unit_testing/NcclMpiTest.cpp
In file included from /home/Yehonatans/work/utils/hpc/NcclGpuSyncObject.h:5,
                 from /home/Yehonatans/work/utils/hpc/unit_testing/NcclMpiTest.cpp:5:
/opt/nvidia/hpc_sdk/Linux_x86_64/24.5/comm_libs/12.4/nccl/include/nccl.h:10:10: fatal error: cuda_runtime.h: No such file or directory
   10 | #include <cuda_runtime.h>
      |          ^~~~~~~~~~~~~~~~
compilation terminated.
[19/52] Building CXX object hpc/unit_testing/CMakeFiles/NcclTesting.dir/NcclTesting.cpp.o
FAILED: hpc/unit_testing/CMakeFiles/NcclTesting.dir/NcclTesting.cpp.o 
/bin/c++ -DGIT_COMMIT_SUBJECT="\"Integrate with gcc11.4 without cuda\"" -DGIT_DATE="\"Mon Jun 10 14:17:46 2024\"" -DGIT_SHA1=\"9be6c8e\" -DGIT_TAG=\"X1000V2.0-149-g9be6c8e\" -DGTEST_LINKED_AS_SHARED_LIBRARY=1 -I/usr/include/opencv4 -I/home/Yehonatans/work/utils/hpc -I/home/Yehonatans/work/utils/general -I/home/Yehonatans/work/utils/threads -I/home/Yehonatans/work/utils/cuda -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/24.5/comm_libs/12.4/hpcx/hpcx-2.19/ompi/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/24.5/comm_libs/12.4/hpcx/hpcx-2.19/ompi/include/openmpi -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/24.5/comm_libs/12.4/hpcx/hpcx-2.19/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/24.5/comm_libs/12.4/hpcx/hpcx-2.19/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/24.5/comm_libs/12.4/hpcx/hpcx-2.19/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/24.5/cmake/../comm_libs/12.4/nccl/include -Wall -pedantic -std=gnu++17 -pthread -Werror -MD -MT hpc/unit_testing/CMakeFiles/NcclTesting.dir/NcclTesting.cpp.o -MF hpc/unit_testing/CMakeFiles/NcclTesting.dir/NcclTesting.cpp.o.d -o hpc/unit_testing/CMakeFiles/NcclTesting.dir/NcclTesting.cpp.o -c /home/Yehonatans/work/utils/hpc/unit_testing/NcclTesting.cpp
In file included from /home/Yehonatans/work/utils/hpc/INcclDispatcherObject.h:4,
                 from /home/Yehonatans/work/utils/hpc/NcclMultiThreadDispatcher.h:7,
                 from /home/Yehonatans/work/utils/hpc/unit_testing/NcclTesting.cpp:5:
/opt/nvidia/hpc_sdk/Linux_x86_64/24.5/comm_libs/12.4/nccl/include/nccl.h:10:10: fatal error: cuda_runtime.h: No such file or directory
   10 | #include <cuda_runtime.h>
      |          ^~~~~~~~~~~~~~~~

What I is the difference that those errors appears?

I don’t see anything between the two releases that would cause this error so assume it’s something different in your build.

The error is because the compiler can’t find the “cuda_runtime.h” header file referenced by “nccl.h”. To fix, you’d add the path to this header as a compiler options, like “-I/opt/nvidia/hpc_sdk/Linux_x86_64/24.5/cuda/12.4/include”, or, if you’re using nvc++, you can add the “-cuda” flag and the compiler will implicitly include this path.

-DGIT_COMMIT_SUBJECT=“"Integrate with gcc11.4 without cuda"”

Does this message imply that your building without CUDA? NCCL requires CUDA.

Hi,
I tried to reproduce the issue without success.
Sorry to interrupt you it’s seems to be like a setup issue

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.