I would like to use OpenMP device offload and CUDA in the same program. The code is templated so they need to compile together. I’ve got a simple makefile based reproducer, about 100 lines of code, here : GitHub - burlen/cuda_plus_offload: test code to see if it possible to compile both CUDA kernells and OpenMP device offload in the same program
Depending on pre-processor defines the code can be compiled with CUDA only, OpenMP offload only, or both CUDA and OpenMP offload. I’ve been trying to use nvcc
w/ the host compiler set to nvc++
. These both come from the HPC SDK 2023 version. The CUDA only and OpenMP only cases indivualy work. However, with CUDA and OpenMP together, it compiles but fails to link.
nvcc -g -G --generate-code=arch=compute_75,code=[compute_75,sm_75] -lcuda -lcudart -lcudadevrt -ccbin=`which nvc++` -DCUMP_USE_OPENMP -DCUMP_USE_CUDA -Xcompiler -g,-Mcuda,-mp=gpu,-gpu=cc75,-Minfo=mp,-lcuda,-lcudart,-lcudadevrt,-Mcuda,-pgf90libs -Xlinker -lcuda,-lcudart,-lcudadevrt -x cu main.cpp -o cump_both_nvhpc
void init_omp<float>(float*, unsigned long, float const&):
1, include "stl_construct.h"
33, #omp target teams loop
33, Generating "nvkernel__Z8init_ompIfEvPT_mRKS0__F16399L33_2" GPU kernel
Generating NVIDIA GPU code
35, Loop parallelized across teams, threads(128) /* blockIdx.x threadIdx.x */
33, Generating Multicore code
35, Loop parallelized across threads
/bin/ld: /tmp/tmpxft_00282bd1_00000000-13_cump_both_nvhpc_dlink.o: in function `__cudaRegisterLinkedBinary__NV_MODULE_ID':
/tmp/tmpxft_00282bd1_00000000-7_cump_both_nvhpc_dlink.reg.c:2: undefined reference to `__fatbinwrap__NV_MODULE_ID'
pgacclnk: child process exit status 1: /bin/ld
make: *** [Makefile.nvhpc:12: cump_both_nvhpc] Error 2
Can anyone help on the link options needed?