How to link host code with a static CUDA library after separable compilation?

Alright, I have a really troubling CUDA 5.0 question about how to link things properly. I’d be really grateful for any assistance!

Using the separable compilation features of CUDA 5.0, I generated a static library (*.a). This nicely links with other *.cu files when run through nvcc, I have done this many times.

I’d now like to take a *.cpp file and link it against the host code in this static library using g++ or whatever, but not nvcc. If I attempt this, I get compiler errors like “undefined reference to __cudaRegisterLinkedBinary”.

I’m using both -lcuda and -lcudart and, to my knowledge, have the libraries in the correct order (meaning -lmylib -lcuda -lcudart). I don’t think it is an issue with that. Maybe I’m wrong, but I feel I’m missing a step and that I need to do something else to my static library (device linking?) before I can use it with g++.

Have I missed something crucial? Is this even possible?

Bonus question: I want the end result to be a dynamic library. How can I achieve this?

Here’s how you can build a shared library:

Let’s say you have some code you would like to compile with g++ (src/p2.cpp) and some other code you would like to compile with nvcc (src/p1.cu) and you want the object files and the shared library in the folder “build”:

Create object file with nvcc:
nvcc -c -I/path/to/header/files/of/shared/library -o build/p1.o src/p1.cu --compiler options -fPIC -Wall

Create object file with g++:
g++ -c -fPIC -Wall -I/path/to/header/files/of/shared/library -o build/p2.o src/p2.cpp

Create a shared library from those two object files:
g++ -shared -Wl,-soname,libchooseaname.so -o build/libchooseaname.so build/p1.o build/p2.o -lc -L/usr/local/cuda/lib64 -lcuda -lcudart

You probably have to change the path to your cuda libraries in the last step and -of course- change the path to your header files and the name of the shared library.

Edit:
I forgot the part about using the shared library. Let’s say you have a source file main.cpp, which should be using your shared library. That’s how you would compile your main.cpp:
g++ -Wall -I/path/to/header/files/of/shared/library -l/path/to/your/shared/library/libchooseaname.so -o main main.cpp

Don’t forget to #include the header-files of your shared library in your main.cpp!

Edit 2:
It’s hard to distinguish between a small L and a capital i and the code-environment here seems to have a bug. But it should be easy to read, if you click on the quote-button or copy&paste the text somewhere else.

Q_2: Thanks so much for your detailed reply. Unfortunately, I am already doing something like what you describe, here are my commands in detail. I am trying to build a Python module around a CUDA static library.

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -I/usr/local/cuda-5.0/include -I/usr/include/python2.7 -c main.cpp -o main.o -fPIC

g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro main.o mycudalib.a -L/usr/local/cuda-5.0/lib64 -L/usr/local/cuda-5.0/lib -lmycudalib -lcuda -lcudart -lcudadevrt -o mysharedlib.so

Above, mycudalib.a is a device-linkable CUDA static library, main.cpp is a C++ source file (no CUDA) that refers to some of the host code in mycudalib.a, and mysharedlib.so is what I ultimately hope to create. I just don’t understand what I’m doing wrong.

For what it is worth, the error I get is “undefined symbol: __cudaRegisterLinkedBinary*” (* = random stuff) when I try to run the program (it does actually compile).