I am new to HPC-SDK and been trying to create a CMake based development setup on Linux-Ubuntu 20.04. I have followed the instructions in NVHPCConfig.cmake shipped with the sdk by NVIDIA and created my CMakeLists.txt file with prefix pointing to the hpc-sdk cmake folder where the NVHPCConfig.cmake resides. This should have been sufficient for me to link my executable to hpc-sdk. However, as it turned out, the provided NVHPCConfig.cmake does not configure either OpenMP or OpenAcc offload runtime libraries or have I missed/ misunderstood something. I tested my setup by trying to run the CUDA-Libraries examples from test_fft_oacc_cpp and test_fft_omp_fft with my CMakeLists file and got ompx_get_cuda_stream and acc_cuda_stream errors. I request the hpc-sdk team to please help plug this information gap. I have attached my CMakeLists.txt file. Please can you advise, what all amends/ appends need to be made to my CMake file so that I can run these examples that have been shipped with hpc-sdk 23.7. Thank you so much!
CMakeLists.txt (2.0 KB)
I have got to a point where I can compile and link the executable. I had quite a few errors in my last CMake file which I fixed. However, when I run it, all I get is a seg fault. I get no compile errors, no linking errors and no warnings. I am attaching the updated CMakeLists.txt and the main executable which is essentially one of your CUDA-Libraries FFT with OpenMP cpp examples. A little help will go a long way here. When I invoke cmake with ¨cmake …¨ command followed by make, this is the output:
abhimehrish@linux-machine:~/Desktop/cuda_hpc/build$ cmake …
– The CUDA compiler identification is NVIDIA 12.2.91
– The CXX compiler identification is NVHPC 23.7.0
– The C compiler identification is NVHPC 23.7.0
– Detecting CUDA compiler ABI info
– Detecting CUDA compiler ABI info - done
– Check for working CUDA compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/bin/nvcc - skipped
– Detecting CUDA compile features
– Detecting CUDA compile features - done
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/bin/nvc++ - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working C compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/bin/nvc - skipped
– Detecting C compile features
– Detecting C compile features - done
– Found MPI_C: /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/comm_libs/11.8/openmpi4/openmpi-4.1.5/lib/libmpi.so (found version “3.1”)
– Found MPI_CXX: /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/comm_libs/11.8/openmpi4/openmpi-4.1.5/lib/libmpi.so (found version “3.1”)
– Found MPI: TRUE (found version “3.1”)
– Found OpenMP_C: -mp (found version “5.1”)
– Found OpenMP_CXX: -mp (found version “5.1”)
– Found OpenMP: TRUE (found version “5.1”)
– CUDA version selected: 11.8
– Configuring done (3.4s)
– Generating done (0.0s)
– Build files have been written to: /home/abhimehrish/Desktop/cuda_hpc/build
abhimehrish@linux-machine:~/Desktop/cuda_hpc/build$ make
[ 50%] Building CUDA object CMakeFiles/cuda_hpc.dir/main.cu.o
[100%] Linking CUDA executable cuda_hpc
[100%] Built target cuda_hpc
CMakeLists.txt (4.5 KB)
main.txt is main.cu
main.txt (3.7 KB)
thank you!
Hi Abhi,
Is it your intent to use nvcc given this is a C++ code using OpenMP target offload?
The problem here is that CUDA C projects use “g++” to link the code which in turn causes the GOMP OpenMP runtime library to be used. If this is really what you want, we might get it to work using their offload support but you’d need to upgrade to a more recent version (like 11.x or newer).
If you rename the file from “main.cu” to “main.cpp”, the you’ll be using nvc++ to both build and link.
I modified your CMakeList.txt file with the following. Note that I made a few other changes to make it easier to switch between compiler versions as well as comment out the unneeded stuff. I set it back to what I think is on your system, but please double check.
cmake_minimum_required(VERSION 3.28.3)
# Clear potentially conflicting environment variables
set(ENV{CC} "")
set(ENV{CXX} "")
set(ENV{FC} "")
set(ENV{F77} "")
set(ENV{F90} "")
set(ENV{CUDACXX} "")
set(ENV{CUDAHOSTCXX} "")
set(ENV{CMAKE_PREFIX_PATH} "")
set(ENV{LIBRARY_PATH} "")
set(ENV{LD_LIBRARY_PATH} "")
# Set the NVHPC root directory (adjust the version and path as necessary)
set(NVHPC_ROOT_DIR "/opt/nvidia/hpc_sdk/")
set(NVHPC_VERSION "23.7")
# Specify the default NVHPC CUDA version
set(NVHPC_CUDA_VERSION "11.8")
set(NVHPC_PATH "${NVHPC_ROOT_DIR}/Linux_x86_64/${NVHPC_VERSION}")
set(CMAKE_CUDA_COMPILER "${NVHPC_PATH}/compilers/bin/nvcc" CACHE FILEPATH "CUDA Compiler")
set(CMAKE_CXX_COMPILER "${NVHPC_PATH}/compilers/bin/nvc++" CACHE FILEPATH "CXX Compiler")
set(CMAKE_C_COMPILER "${NVHPC_PATH}/compilers/bin/nvc" CACHE FILEPATH "C Compiler")
project(cuda_hpc CUDA CXX C) # enable CUDA language
# Skip RPATH for specific directories if needed
#set(CMAKE_SKIP_RPATH TRUE)
#set(CMAKE_SKIP_RPATH_LIST "/usr/lib/gcc/x86_64-linux-gnu/11")
# Set some sensible default CUDA architectures.
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES 86)
message(STATUS "Setting default CUDA architectures to ${CMAKE_CUDA_ARCHITECTURES}")
endif()
# See https://gitlab.kitware.com/cmake/cmake/-/issues/23081, this should not be needed according
# to the CMake documentation, but it is not clear that any version behaves as documented.
if(DEFINED CMAKE_CUDA_HOST_COMPILER)
unset(ENV{CUDAHOSTCXX})
endif()
# Enable CUDA language support.
enable_language(CUDA)
# Prefer shared libcudart.so
if(${CMAKE_VERSION} VERSION_LESS 3.17)
# Ugly workaround from https://gitlab.kitware.com/cmake/cmake/-/issues/17559, remove when
# possible
if(CMAKE_CUDA_HOST_IMPLICIT_LINK_LIBRARIES)
list(REMOVE_ITEM CMAKE_CUDA_HOST_IMPLICIT_LINK_LIBRARIES "cudart_static")
list(REMOVE_ITEM CMAKE_CUDA_HOST_IMPLICIT_LINK_LIBRARIES "cudadevrt")
list(APPEND CMAKE_CUDA_HOST_IMPLICIT_LINK_LIBRARIES "cudart")
endif()
if(CMAKE_CUDA_IMPLICIT_LINK_LIBRARIES)
list(REMOVE_ITEM CMAKE_CUDA_IMPLICIT_LINK_LIBRARIES "cudart_static")
list(REMOVE_ITEM CMAKE_CUDA_IMPLICIT_LINK_LIBRARIES "cudadevrt")
list(APPEND CMAKE_CUDA_IMPLICIT_LINK_LIBRARIES "cudart")
endif()
else()
# nvc++ -cuda implicitly links dynamically to libcudart.so. Setting this makes sure that CMake
# does not add -lcudart_static and trigger errors due to mixed dynamic/static linkage.
set(CMAKE_CUDA_RUNTIME_LIBRARY Shared)
endif()
set(NVHPC_ENABLE "ON")
add_definitions("-DUSE_NVHPC")
set(CMAKE_PREFIX_PATH "${NVHPC_PATH}/cmake" CACHE PATH "CMake prefix path")
#list(APPEND CMAKE_PREFIX_PATH "${NVHPC_PATH}/comm_libs/12.4/openmpi4/openmpi-4.1.5/lib/cmake/ucx")
set(MPI_C "${NVHPC_PATH}/comm_libs/${NVHPC_CUDA_VERSION}/openmpi4/openmpi-4.1.5/bin/mpicc" CACHE FILEPATH "MPI_C")
set(MPI_CXX "${NVHPC_PATH}/comm_libs/${NVHPC_CUDA_VERSION}/openmpi4/openmpi-4.1.5/bin/mpic++" CACHE FILEPATH "MPI_CXX")
set(CMAKE_CUDA_STANDARD 17)
find_package(NVHPC REQUIRED COMPONENTS CUDA MATH HOSTUTILS NVSHMEM NCCL PROFILER)
find_package(OpenMP)
set(CMAKE_CUDA_FLAGS
"${CMAKE_CUDA_FLAGS} -ccbin ${NVHPC_PATH}/compilers/bin/nvc++ -Xcompiler -mp=gpu -gpu=nordc")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -cuda -mp=gpu -cudalib=cufft")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS}")
# Set CUDA flags for debugging if necessary
if(CMAKE_BUILD_TYPE STREQUAL "Debug")
set(CMAKE_CUDA_FLAGS ${CMAKE_CUDA_FLAGS} "-g -G") # enable cuda-gdb
endif()
# Add the executable
add_executable(cuda_hpc "main.cpp")
#target_include_directories(cuda_hpc PRIVATE ${NVHPC_PATH}/compilers/include)
# Link NVHPC and OpenMP libraries
#target_link_libraries(cuda_hpc PRIVATE
# NVHPC::CUDA
# NVHPC::MATH
# NVHPC::HOSTUTILS
# NVHPC::NVSHMEM
# NVHPC::NCCL
# NVHPC::MPI
# NVHPC::PROFILER
# OpenMP::OpenMP_CXX
# ${NVHPC_PATH}/compilers/lib/libomp.so # Explicitly link to HPC SDK's OpenMP library
# ${NVHPC_PATH}/compilers/lib/libnvc.so # Explicitly link to the NVHPC C library
#)
# Ensure OpenMP flags are set for CUDA compilation
#target_compile_options(cuda_hpc PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:-Xcompiler=-fopenmp>)
# Adjust RPATH to prioritize HPC SDK libraries
set_target_properties(cuda_hpc PROPERTIES
BUILD_RPATH "${NVHPC_PATH}/compilers/lib"
INSTALL_RPATH "${NVHPC_PATH}/compilers/lib"
INSTALL_RPATH_USE_LINK_PATH TRUE
)
I am so grateful to you for this. My intent is to use the hpc_sdk in its entirety. I did not know that file should not be .cu given that I am new to hpc_sdk. I am aware the GCC 11 can facilitate offload. I believe it is a two step compilation with GCC but I do not intend to do that.
I will test and come back to you.
Thank you and best Regards,
Hello Mat, I tried with the modified CMake file and after ensuring that I changed the extension of my executable to .cpp, I am still getting the same segfault. I did run valgrind last time too but did not provide you that as I thought, I must be doing something very wrong for my program to compile and link but not execute. I see exactly the same error as I was seeing the last time. Below is my compile and link summary. Please do find attached the valgrind.rpt file too. Thank you and best wishes:
abhimehrish@linux-machine:~/Desktop/cuda_hpc/build$ cmake …
– The CUDA compiler identification is NVIDIA 12.2.91
– The CXX compiler identification is NVHPC 23.7.0
– The C compiler identification is NVHPC 23.7.0
– Detecting CUDA compiler ABI info
– Detecting CUDA compiler ABI info - done
– Check for working CUDA compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/bin/nvcc - skipped
– Detecting CUDA compile features
– Detecting CUDA compile features - done
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/bin/nvc++ - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working C compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/bin/nvc - skipped
– Detecting C compile features
– Detecting C compile features - done
– CUDA version selected: 11.8
– Found OpenMP_C: -mp (found version “5.1”)
– Found OpenMP_CXX: -mp (found version “5.1”)
– Found OpenMP: TRUE (found version “5.1”)
– Configuring done (2.6s)
– Generating done (0.0s)
– Build files have been written to: /home/abhimehrish/Desktop/cuda_hpc/build
abhimehrish@linux-machine:~/Desktop/cuda_hpc/build$ make
[ 50%] Building CXX object CMakeFiles/cuda_hpc.dir/main.cpp.o
[100%] Linking CXX executable cuda_hpc
[100%] Built target cuda_hpc
abhimehrish@linux-machine:~/Desktop/cuda_hpc/build$ ls
CMakeCache.txt CMakeFiles cmake_install.cmake cuda_hpc Makefile
abhimehrish@linux-machine:~/Desktop/cuda_hpc/build$ ./cuda_hpc
Segmentation fault (core dumped)
abhimehrish@linux-machine:~/Desktop/cuda_hpc/build$ valgrind --leak-check=yes --log-file=valgrind.rpt ./cuda_hpc
Segmentation fault (core dumped)
abhimehrish@linux-machine:~/Desktop/cuda_hpc/build$
valgrind.txt (2.6 KB)
Since I can’t recreate the segv I don’t know specifically what’s going on. Though segvs occur on the host and in this case in a “freeres”. My best guess is that it’s a problem with the program and I’m just getting lucky that it doesn’t segv for me.
Looking at the code, it appears you’ve used our “tcufft2dompc4.cpp” example as the basis. However, that example must use CUDA Managed Memory and the data needs to be put in the heap (i.e. allocated) rather that be on the stack. You modified the code so the complex arrays are on the stack and use OpenMP data regions. I’d recommend you try the original example to see if it works for you. Though be sure to also add the “-gpu=managed” flag so managed memory is enabled.
If you look at the “tcufft2dompc5.cpp” example, it uses data regions instead of managed memory, but then needs to have pointers to the complex data rather than directly use the complex container themselves. It also uses 2D arrays, so you’d need to translate the pattern to 1D arrays.
I have managed to make tcufft2dompc4.cpp and tcufft2dompc5.cpp work. I looked at the accompanying makefile in the examples and realised that the flags for the tcufft2dompc4.cpp are exactly what you suggested. tcufft2dompc5.cpp suggests that I should remove the limit on stack to avoid segementation faults and use the following flags: CXXFLAGS ?= -fast -mp=gpu -cudalib=cufft for reasons that you elaborated in your reply and **********mpc3.cpp just uses the nvc++ compiler? I But very grateful for bringing me thus far. To run tcufft2dompc5.cpp, all I did was, made the changes to flags:
set(CMAKE_CXX_FLAGS “${CMAKE_CXX_FLAGS} -cuda -fast -mp=gpu -cudalib=cufft”)
and then compiled and linked my excutable as normal. Before running the compiled binaries, I ran this command: ulimit -s unlimited
To summarise:
abhimehrish@linux-machine:~/Desktop/cuda_hpc/build$ ulimit -s unlimited
abhimehrish@linux-machine:~/Desktop/cuda_hpc/build$ ./cuda_hpc “$@”
Initializing data on device, iteration 0…
Initializing data on device, iteration 1…
Initializing data on device, iteration 2…
Initializing data on device, iteration 3…
Computing, iteration 0…
Computing, iteration 1…
Computing, iteration 2…
Computing, iteration 3…
Finalizing data, iteration 0…
Finalizing data, iteration 1…
Finalizing data, iteration 2…
Finalizing data, iteration 3…
Max error Convolution: (0.0000000e+00,0.0000000e+00)
Test PASSED
Max error Convolution: (0.0000000e+00,0.0000000e+00)
Test PASSED
Max error Convolution: (0.0000000e+00,0.0000000e+00)
Test PASSED
Max error Convolution: (0.0000000e+00,0.0000000e+00)
Test PASSED
Max error Convolution: (0.0000000e+00,0.0000000e+00)
Test PASSED
Max error Convolution: (0.0000000e+00,0.0000000e+00)
Test PASSED
Max error Convolution: (0.0000000e+00,0.0000000e+00)
Test PASSED
Max error Convolution: (0.0000000e+00,0.0000000e+00)
Test PASSED
Max error Convolution: (0.0000000e+00,0.0000000e+00)
Test PASSED
Max error Convolution: (0.0000000e+00,0.0000000e+00)
Test PASSED
Max error Convolution: (0.0000000e+00,0.0000000e+00)
Test PASSED
Max error Convolution: (0.0000000e+00,0.0000000e+00)
Test PASSED
I will try the third exmaple too but if its ok, please help with suggestions for changes to CMAKE file for example 3. Thank you!
I don’t think there’s anything wrong with the CMake file. While I can’t reproduce the segv and get correct answer, I do see multiple valgrind errors when running the tcufft2dompc3.cpp examples as well as the equivalent OpenACC and C examples. Fortran is fine. Hence there might be something wrong with the test itself and it just happens to segv in your environment.
Let me report it to engineering to see if the valgrind issues are benign or could cause issues. For now, stick to the tcufft2dompc4.cpp example as your guide.
Thank you for your prompt reply. I was thinking about having a go at OpenAcc examples too but will wait now until Engineering gets back to you. Below is the output of tcufft2dompc4.cpp. So 4 and 5 with OpenMP are definitely working and tests are passing. I cannot make tcufft2dompc3.cpp work. I get Seg Fault when I run it. Its compiling and linking but doesnt run, not in my setup. Just to be clear, I am working with 23.7 sdk and 11.8 CUDA toolkit
abhimehrish@linux-machine:~/Desktop/cuda_hpc/build$ ./cuda_hpc
Max error C2C FWD: (0.0000000e+00,0.0000000e+00)
Max error C2C INV: 0.0000000e+00
Max error R2C/C2R: 0.0000000e+00
Test PASSED
Understood. Though don’t wait for my engineers, it could be a bit before they get to it.
It would be interesting for you to try the OpenACC example since I see the same Valgrind issues with that version (though the example passes otherwise).
HI Mat, I could make both OpenAcc cpp files: tcufft2dc3.cpp and tcufft2dc4.cpp work. I could also make the OpenMP file which was giving me seg fault i.e. tcufft2dompc3.cpp work. I made a few appends to my .bashrc file. All additions associated with cuda and hpc_sdk now look like this:
export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/comm_libs/11.8/openmpi4/openmpi-4.1.5/lib:$LD_LIBRARY_PATH
export PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/comm_libs/11.8/openmpi4/openmpi-4.1.5/bin:$PATH
export LD_LIBRARY_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib:$LD_LIBRARY_PATH
export PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/bin:$PATH
export MANPATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/man:$MANPATH
export LIBRARY_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib:$LIBRARY_PATH
The CMAKE file doesnt change much when compared with OpenMP. Besides getting rid of MPI and OpenMP lines in the CMakeLists.txt file above, for OpenAcc examples, only following changes are needed:
find_package(OpenACC REQUIRED)
set(CMAKE_CUDA_FLAGS
“${CMAKE_CUDA_FLAGS} -ccbin ${NVHPC_PATH}/compilers/bin/nvc++ -Xcompiler -acc=gpu -gpu=nordc”)
set(CMAKE_CXX_FLAGS “${OpenAcc_CXX_FLAGS} -fast -acc=gpu -cudalib=cufft”)
set(CMAKE_C_FLAGS “${CMAKE_C_FLAGS} ${OpenAcc_C_FLAGS} -fast -acc=gpu”)
set(CMAKE_EXE_LINKER_FLAGS “${CMAKE_EXE_LINKER_FLAGS}”)
The flags: -fast -acc=gpu -cudalib=cufft are per the MAKE file recommendation for the specific example in this case i.e. tcufft2dc3.cpp. I think, for this example and for the OpenMP examples 3 and 5, it is very important to remove the stack limit by using “ulimit -s unlimited” command before executing the compiled binary otherwise we will get a seg fault despite correct linking. I did not check for valgrind errors. All TESTS are passing. If and when you get a reply from your engineering, please do let me know. It would be good to know if there are any dos and donts. HOpefully this thread on CMAKE will be useful for many. Thank you for your time.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.