NVCC Compile Shared Library

I want to use NVCC to compile the shared library first then link it with main program.

nvcc --compiler-options '-fPIC' -o liblibrary.so --shared library.cu
nvcc -Xcompiler -I./ main.cu -L. -llibrary
nvcc -I./ main.cu -L. -llibrary
export LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH 
./a.out

There are several files.

library.cu

#ifndef LIB1_H_INCLUDED
#define LIB1_H_INCLUDED

void print_value ( int x );
__global__ void cuda_hello();

#endif /* LIB1_H_INCLUDED */

library.cuh

#include "library.cuh"
#include <stdio.h>

void print_value( int x )
{
    printf("%d\n", x);
}

__global__ void cuda_hello(){
    printf("Hello World from GPU!\n");
}

main.cu

#include <stdio.h>
#include "library.cuh"

int main ( void )
{
    print_value(10);
    cuda_hello<<<1,1>>>(); 
    return 0;
}

However, it will not print any output, is there anything wrong? while print_value can print the output correctly.

after:

you should have:

cudaDeviceSynchronize();

Please also use proper CUDA error checking (google that, take the first hit, modify your code).

Hi, @Robert_Crovella
This is the code I tried, but still no output for the error message.

int main ( void )
{
    print_value(10);
    cuda_hello<<<1,1>>>(); 
    gpuErrchk( cudaDeviceSynchronize());
    gpuErrchk( cudaGetLastError());
    return 0;
}

where gpuErrchk is defined as:

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

I recommend that you keep CUDA syntax out of the shared library interface.

$ cat library.cuh
#ifndef LIB1_H_INCLUDED
#define LIB1_H_INCLUDED

void print_value ( int x );
void cuda_hello();

#endif /* LIB1_H_INCLUDED */
$ cat library.cu
#include "library.cuh"
#include <stdio.h>

void print_value( int x )
{
    printf("%d\n", x);
}

__global__ void cuda_hello_kernel(){
    printf("Hello World from GPU!\n");
}

void cuda_hello(){
     cuda_hello_kernel<<<1,1>>>();
     cudaDeviceSynchronize();
}

$ cat main.cu
#include <stdio.h>
#include "library.cuh"

int main ( void )
{
    print_value(10);
    cuda_hello();
    return 0;
}
$ nvcc -Xcompiler -fPIC -o liblibrary.so --shared library.cu
$ nvcc -I. main.cu -L. -llibrary
$ LD_LIBRARY_PATH=. cuda-memcheck ./a.out
========= CUDA-MEMCHECK
10
Hello World from GPU!
========= ERROR SUMMARY: 0 errors
$
1 Like

Hi @Robert_Crovella Thanks a lot!
Just have one more follow up question.

when should we use -rdc=true during the compilation?

when device code in one compilation unit is referenceing or calling device code in another compilation unit. You have no examples of that here.

Does this apply to the case where I have the definition of a __device__ function in one .cu file and compile it to .so, then link this .so with another .cu file in the later compilation?

No, that won’t work, generally. It is a stated limitation that you cannot device-link across a .so boundary. You can device link across a static library boundary.

You can device link within a .so, just not across the boundary/interface.

Got it, Now, I understand why CUDA-based libraries such as cuDNN, cuBLAS, NCCL, only offer host CPU API instead of the __global__ function API.

However, I found there is one exception is for NVSHMEM, they also offer device-level API for a thread/warp/block that can directly be called from a __global__ kernel, which is quite different from those above CUDA libraries.

nvshmem is implemented via a static library. As we’ve already discussed, you can do more-or-less anything with a static library.