Dear all,
I’m on a mission to introduce CUDA functionality to an existing C++ code base. I tried to go by the book using a simple example on GitHub:
and in a blog post:
The complications of my case are the following: (a) the code base uses MPI and the final executable is created with mpic++
, (b) the code base uses middleware called CmdStan (Stan - CmdStan) and its extensive math library called StanMath (Stan Math Library: Stan Math Library Docs), (c) due to peculiarities of the middleware all the C++ user code is declared and written in header files (*.hpp) and (d) StanMath relies heavily on Boost and Eigen libraries.
First of all, I would like to ask you to please check, if my programming logic is correct:
a) I introduced a simple CUDA kernel called update_weights
.
b) It’s parent function called hmc_proposal
calls update_weights
with a CUDA syntax update_weights<<<N,M>>>(x,y,z)
.
c) Both update_weights
and hmc_proposal
are moved to a separate source code file called kernels.cu
.
d) The signatures of update_weights
and hmc_propoasl
are declared in a dedicated CUDA header file kernels.cuh
.
e) Function update_weights
is decorated with a keyword __global__
both in the source and header files (*.cu, *.cuh).
f) Header file kernels.cuh
is included into the CUDA source file kernels.cu
.
g) I use forward declaration and declare the signature of the parent function hmc_proposal
at the beginning of the normal C++ source code file hybrid_smc_method.hpp
.
h) I compile the C++ code base with the GNU compiler g++
.
i) I compile CUDA source file kernels.cu
into an object file kernels.o
with nvcc
:
$(NVCC) $(NVCC_FLAGS) -x cu -dc kernels.cu -o kernels.o
j) I prepare CUDA object file for linking with a ‘foreign’ compiler with:
$(NVCC) $(NVCC_FLAGS) -dlink kernels.o -o kernels_lnk.o $(NVCC_LDLIBS)
k) Finally, I link everything together with an mpic++
compiler that wraps around g++
:
$(MPICXX) $(CXXFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) *.o kernels_lnk.o kernels.o main.o $(LDLIBS) $(LIBSUNDIALS) $(NVCC_LDFLAGS) $(NVCC_LDLIBS) $(MPI_TARGETS) $(TBB_TARGETS)
Finally, the linking fails with an undefined reference to
void stan::smcs::hmc_proposal`. It seems my ‘forward declaration’ doesn’t work. Can you please verify my compilation steps and help me to link everything together?