Separate compilation of mixed CUDA OpenACC code

najy76 · October 21, 2021, 7:15pm

I present a simple test program composed by CUDA source code, an OpenACC source code and a plain C++ main which calls some functions defined in the other two compilation units. This is a simplification of a bigger program composed of many CUDA and OpenaCC source files.

I need to link the executable with separable compilation using an intermediate device linking step followed by the last host linking step as described in the nvidia blog. This need comes from other restrictions not reproducible with this sample code.

Here are the steps I take:

OPENACC_ARCH_FLAGS="-acc=gpu -gpu=cc70 -acc=noautopar -Minfo=accel"
CUDA_ARCH_FLAGS="--generate-code=arch=compute_70,code=[compute_70,sm_70]"
CUDA_LIB_DIR=$HPC_SDK_HOME/Linux_ppc64le/2021/cuda/lib64 # customize your path

# Compile MPI C++ code
pgc++ -c main.cpp -o main.cpp.o

# Compile OPENACC code
pgc++ $OPENACC_ARCH_FLAGS -c test_openacc.cpp -o openacc.cpp.o

# Compile CUDA code
nvcc $CUDA_ARCH_FLAGS -dc test_cuda.cu -o cuda.cu.o

# removing openacc.cpp.o from dlink object works without errors
DLINK_OBJS="cuda.cu.o main.cpp.o openacc.cpp.o" # <=== this cause error
nvcc $CUDA_ARCH_FLAGS -dlink $DLINK_OBJS -o dlink.o

# Generate executable
nvc++ $OPENACC_ARCH_FLAGS -o main cuda.cu.o openacc.cpp.o main.cpp.o dlink.o -L$CUDA_LIB_DIR -lcudadevrt -lcudart_static -lr

PROBLEM: if I include the openacc object code into the device linking step I get an unresolved symbol error during the final host linking stage:

undefined reference to `__fatbinwrap_98_test_openacc_cpp'
pgacclnk: child process exit status 1: /usr/bin/ld

QUESTION:

why including the openacc.cpp.o object in the dlink.o object produce this error at host linking step?
why including the main.cpp.o object in the dlink.o object does not produce any problem?

MatColgrove · October 21, 2021, 9:01pm

nvc++ compiles with RDC on by default and adding the flag “-gpu=nordc” to OPENACC_ARCH_FLAGS should fix the issue.

I’m not an expert in the inner workings of nvcc, but my guess is that when it creates the fat bin wrapper symbols, it doesn’t know how to interrupt the OpenACC symbol names so doesn’t create the symbol name correctly. They should something like “__fatbinwrap_98_tmpxft_00028c08_00000000_8_test_openacc_cpp”. Though since nvcc doesn’t support OpenACC, it wouldn’t expect it to.

As I noted on your SO post, the “dlink” step shouldn’t be necessary. nvc++ already includes the device linking so no need to separate it out. Something like the following should work:

OPENACC_ARCH_FLAGS="-acc=gpu -gpu=cc70 -acc=noautopar -Minfo=accel"
CUDA_ARCH_FLAGS="--generate-code=arch=compute_70,code=[compute_70,sm_70] -rdc"

# Compile MPI C++ code
nvc++ -c main.cpp -o main.cpp.o

# Compile OPENACC code
nvc++ $OPENACC_ARCH_FLAGS -c test_openacc.cpp -o openacc.cpp.o

# Compile CUDA code
nvcc $CUDA_ARCH_FLAGS -c test_cuda.cu -o cuda.cu.o

# Generate executable
nvc++ $OPENACC_ARCH_FLAGS -o main cuda.cu.o openacc.cpp.o main.cpp.o -cuda -static-nvidia -lr

najy76 · October 22, 2021, 10:39am

Thank you Mat for clarifing the default RDC behaviour of nvc++ and providing a solution using direct linking with nvc++.

Although the dlink step is not necessary with this simple example, I assumed it should worked anyway. For example, as I posted on SO, configuring a CUDA project with CMAKE with separate compilation, the build process always goes through the intermediate dlink step, followed by the final host linking step. That’s why I wanted to understand how to properly set stuff up to go with the intermediate dlink step.

Do you think I can submit a bug report to your nvidia support collegues regarding the fact that nvcc produces a dlink.o object which cannot be linked with the host linker when dealing with an openacc object?

Topic		Replies	Views
CUDA_LINK_SEPARABLE_COMPILATION_OBJECTS function in FindCUDA.cmake CUDA Setup and Installation	1	818	January 7, 2019
Can an OpenACC accelerated shared object contain cpu and gpu code both? nvc, nvc++ and nvfortran	3	264	April 30, 2024
NVCC with C++ application CUDA Programming and Performance	1	1225	March 5, 2017
Separate compilation of CUDA code into library, for use with existing code base CUDA Programming and Performance	9	9427	June 15, 2017
Link failure with Thrust and separate compilation CUDA Programming and Performance	4	2224	January 21, 2018
NVCC linking issue CUDA Programming and Performance	7	3591	June 23, 2009
cudaGetSymbolAddress error when mixing OpenACC and shared libraries nvc, nvc++ and nvfortran	1	481	July 14, 2022
unresolved external symbol _main referenced in function ___tmainCRTStartup CUDA Programming and Performance	7	9315	February 22, 2011
Nvlink seems not to link for cuda libraries if cross compiling and --cpu-arch=AARCH64 is specified General driveos-cuda	6	2268	November 9, 2021
nvcc (nvlink) not linking against device code library CUDA Programming and Performance	7	11400	June 20, 2018

Separate compilation of mixed CUDA OpenACC code

Related topics