Introducing CUDA into complex C++ code base using middleware called CmdStan

Dear all,

I’m on a mission to introduce CUDA functionality to an existing C++ code base. I tried to go by the book using a simple example on GitHub:

and in a blog post:

The complications of my case are the following: (a) the code base uses MPI and the final executable is created with mpic++, (b) the code base uses middleware called CmdStan (Stan - CmdStan) and its extensive math library called StanMath (Stan Math Library: Stan Math Library Docs), (c) due to peculiarities of the middleware all the C++ user code is declared and written in header files (*.hpp) and (d) StanMath relies heavily on Boost and Eigen libraries.

First of all, I would like to ask you to please check, if my programming logic is correct:
a) I introduced a simple CUDA kernel called update_weights.
b) It’s parent function called hmc_proposal calls update_weights with a CUDA syntax update_weights<<<N,M>>>(x,y,z).
c) Both update_weights and hmc_proposal are moved to a separate source code file called kernels.cu.
d) The signatures of update_weights and hmc_propoasl are declared in a dedicated CUDA header file kernels.cuh.
e) Function update_weights is decorated with a keyword __global__ both in the source and header files (*.cu, *.cuh).
f) Header file kernels.cuh is included into the CUDA source file kernels.cu.
g) I use forward declaration and declare the signature of the parent function hmc_proposal at the beginning of the normal C++ source code file hybrid_smc_method.hpp.
h) I compile the C++ code base with the GNU compiler g++.

i) I compile CUDA source file kernels.cu into an object file kernels.o with nvcc:
$(NVCC) $(NVCC_FLAGS) -x cu -dc kernels.cu -o kernels.o

j) I prepare CUDA object file for linking with a ‘foreign’ compiler with:
$(NVCC) $(NVCC_FLAGS) -dlink kernels.o -o kernels_lnk.o $(NVCC_LDLIBS)

k) Finally, I link everything together with an mpic++ compiler that wraps around g++:
$(MPICXX) $(CXXFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) *.o kernels_lnk.o kernels.o main.o $(LDLIBS) $(LIBSUNDIALS) $(NVCC_LDFLAGS) $(NVCC_LDLIBS) $(MPI_TARGETS) $(TBB_TARGETS)

Finally, the linking fails with an undefined reference to void stan::smcs::hmc_proposal`. It seems my ‘forward declaration’ doesn’t work. Can you please verify my compilation steps and help me to link everything together?