Shared Library - Separate Compilation C, C++

Hey guys, I’m having a problem trying to create a dynamic shared library (.so) using CUDA. The kernel is written in C (.cu) and the caller is written in C++ (*.cpp).

One of the functions defined in my shared library is setParams().
However, I get this error when I load the shared library with dlopen():

load ./
loading: './'
./ undefined symbol: setParams

This is the break-up of my code:
matrix.cpp - contains a call to the .cu function:

// this function is defined in the .cu file
extern void matrixMulOnDevice(Matrix* A, Matrix* B, Matrix* C);

// this function calls matrixMulOnDevice
void run()

// this file also contains this and other functions
int setParams(const char* buffer)
} - contains the kernel, definition of matrixMulOnDevice()

__global__ void matrixMulCUDA(Matrix* A, Matrix* B, Matrix* C)

void matrixMulOnDevice(Matrix* A, Matrix* B, Matrix* C)
   // does a call to matrixMulCUDA

So the problem is that I can’t seem to be able to compile and combine these 2 files (matrix.cpp + into a shared library ( without getting the undefined symbol error when I load the library.

This is how I tried to compile:

nvcc -arch sm_20 --compiler-options '-fPIC' -dc -o matrixKernel.o
nvcc -arch sm_20 --compiler-options '-fPIC' -dc matrix.cpp -o matrixPlugin.o -I ../
nvcc -arch sm_20 --compiler-options '-fPIC' -dlink matrixKernel.o matrixPlugin.o -o matrix.o
g++ -shared -o matrixKernel.o matrixPlugin.o matrix.o -lcudart

What am I doing wrong here? Can somebody help me to compile this the right way?

Was your application (from which you are calling dlopen()) linked against this shared library? If so what was in the header file used to provide the prototype for the various library-exposed functions? Why do you need to use dlopen() instead of letting the library load occur automatically?

I’m creating a plugin based application, so the user should be able to load any dynamic shared libraries into the program at any given time at will. In other words, “” is not essential for the functionality of the program, but once the user loads that library, he or she should be able to run it’s program.

I was able to get other “plugins” working without problems. I’ve created a “” which when the run() function is called, it displays “hello world” in the screen.
This is how I create it:

cc -Wall -O0 -g -fPIC -I ../ -c -o hello.o hello.c
cc  -shared -o hello.o

Once I start the program that handles the plugins, I can type the “load” and “run” commands and it will execute:

load ./
run hello
Hell World!

So I’m trying to get CUDA shared libraries to load in the same fashion, but dlopen() fails saying that it can’t find some of the functions (hence the “undefined symbol” errors).
I think that I am just not compiling / linking the CUDA object correctly.
If anyone can give point me into the right direction for loading the library successfully, I would appreciate it.

Try the following: examine with nm to see what symbols are defined and referenced.
e.g. nm # lists all symbols
nm -u # lists undefined symbols
Also check these against the symbols in the individual .o files.

My guess as to the issue is that the symbols in matrix.cpp are being left out of the link stage because they are not referenced. You might try -u (undefined symbol name) to force a reference so the .o is included.

THANK YOU @yodacallmesome, you are AWESOME!!!

With your help I was able to dig into the problem and the fix for it was that I needed to add: extern “C” to those plugin functions. -

extern "C"

I realized that the function symbols were mangled when compiled with C++, but they were normal when compiled with C.

In my example (which is purelly C, no CUDA included), I get the following:

#cc -Wall -O0 -g -fPIC -I ../ -c -o hello.o hello.c
#cc  -shared -o hello.o
#nm | grep "T "
0000000000000c7c T _fini
0000000000000b0f T getParams
0000000000000c6b T getRunTime
00000000000007f0 T _init
0000000000000c5e T queryParamInfo
0000000000000bf4 T run
0000000000000985 T setParams

But my was outputting this: (which was compiled w/ g++)

#nm | grep "T "
000000000000500f T __cudaRegisterLinkedBinary_41_tmpxft_00002f63_00000000_6_matrix_cpp1_ii_0b951eab
000000000000506c T _fini
0000000000002f78 T _init
0000000000004933 T _Z10freeMatrixP6Matrix
000000000000430e T _Z10getRunTimev
000000000000486c T _Z13displayMatrixP6Matrix
0000000000004301 T _Z14queryParamInfov
000000000000431e T _Z14readMatrixFilePc
00000000000045fc T _Z15writeMatrixFilePcP6Matrix
000000000000378a T _Z17matrixMulOnDeviceP6MatrixS0_S0_
00000000000036c5 T _Z20allocateDeviceMatrixP6MatrixS0_
00000000000040d8 T _Z3runv
0000000000003ffc T _Z9getParamsPci
0000000000003e90 T _Z9setParamsPKc

So, digging around I found out that I needed the extern “C” in front of my function names in the C++ program.