Nvcc and nvlink error

My cuda is v11.3, I want to compile a .so file so that my host and device program both can link it. Here is my program:

**cat a.h**

#ifndef A_H
#define A_H
#include <cuda_runtime.h>
#include <stdio.h>
class TestCudaHeader {
public:
    static __host__ __device__ void print_self();
};
#endif

**cat a.cu**
#include "a.h"
__host__ __device__ void TestCudaHeader::print_self() {
    printf("this is test_cuda_header\n");
}

**cat kernel.cu**
#include "a.h"
__global__ void test_gpu() {
    TestCudaHeader::print_self();
    return;
}
extern cudaError_t TEST_GPU() {
    test_gpu<<<1,1>>>();
    return cudaDeviceSynchronize();
}

**cat main.cpp**
#include "a.h"

extern cudaError_t TEST_GPU();
int main() {
    TestCudaHeader::print_self();
    TEST_GPU();
}

here is my compile step:

  1. nvcc -dc -Xcompiler -fPIC a.cu -o a.o

  2. nvcc -Xcompiler -fPIC -dlink a.o -o link.o

  3. nvcc -Xcompiler -fPIC -shared a.o link.o -o libtest.so

  4. nvcc -rdc=true kernel.cu main.cpp -o main -ltest

An error occurred:
nvlink error : Undefined reference to ‘_ZN14TestCudaHeader10print_selfEv’ in ‘/tmp/tmpxft_0000fd9a_00000000-12_kernel.o’
How can I solve it?

you cannot do device linking across an so boundary, which is what you are trying to do.

To do device linking across a library boundary, that library must be a static library.

I have try it, but it also not works

  1. nvcc -dc a.cu -o a.o

  2. nvcc -dlink a.o -o link.o

  3. nvcc --lib a.o link.o -o libtest.a

  4. nvcc -rdc=true kernel.cu main.cpp -o main -L./ -ltest

the error is same: nvlink error : Undefined reference to ‘_ZN14TestCudaHeader10print_selfEv’ in ‘/tmp/tmpxft_00001d6a_00000000-10_kernel.o’

By the way, I try to use the steps below, the probles also occurred.

  1. nvcc -dc a.cu -o a.o
  2. nvcc -dlink a.o -o link.o
  3. ar rcs libtest.a a.o link.o
  4. nvcc -rdc=true kernel.cu main.cpp -o main -L./ -ltest

nvlink error : Undefined reference to ‘_ZN14TestCudaHeader10print_selfEv’ in ‘/tmp/tmpxft_00005edb_00000000-10_kernel.o’.
Can you please help me solve this problem? I have been troubled by it for a long time.

The following sequence will work:

nvcc -dc a.cu -o a.o
ar rcs libtest.a a.o
nvcc -rdc=true main.cpp kernel.cu -o main -L./ -ltest

Before performing above commands be sure to clean out (remove) any .o files in your build environment and also remove the previous libtest.a.

For some explanation, the key message is that device links should only be performed once, per object. Another way to state the idea is that multiple device links can be performed during a build process, as long as they don’t overlap (ie. are independent).

By issuing this command:

nvcc -dlink a.o -o link.o

You were prematurely device-linking a.o

Then later, when it did actually need to be device-linked for usage by kernel.cu, it was “too late”.