NVCC Compile Shared Library

Daniel_Wong · May 31, 2021, 4:56am

I want to use NVCC to compile the shared library first then link it with main program.

nvcc --compiler-options '-fPIC' -o liblibrary.so --shared library.cu
nvcc -Xcompiler -I./ main.cu -L. -llibrary
nvcc -I./ main.cu -L. -llibrary
export LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH 
./a.out

There are several files.

library.cu

#ifndef LIB1_H_INCLUDED
#define LIB1_H_INCLUDED

void print_value ( int x );
__global__ void cuda_hello();

#endif /* LIB1_H_INCLUDED */

library.cuh

#include "library.cuh"
#include <stdio.h>

void print_value( int x )
{
    printf("%d\n", x);
}

__global__ void cuda_hello(){
    printf("Hello World from GPU!\n");
}

main.cu

#include <stdio.h>
#include "library.cuh"

int main ( void )
{
    print_value(10);
    cuda_hello<<<1,1>>>(); 
    return 0;
}

However, it will not print any output, is there anything wrong? while print_value can print the output correctly.

Robert_Crovella · June 1, 2021, 9:28pm

after:

you should have:

cudaDeviceSynchronize();

Please also use proper CUDA error checking (google that, take the first hit, modify your code).

Daniel_Wong · June 2, 2021, 12:29am

Hi, @Robert_Crovella
This is the code I tried, but still no output for the error message.

int main ( void )
{
    print_value(10);
    cuda_hello<<<1,1>>>(); 
    gpuErrchk( cudaDeviceSynchronize());
    gpuErrchk( cudaGetLastError());
    return 0;
}

where gpuErrchk is defined as:

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

Robert_Crovella · June 2, 2021, 2:35am

I recommend that you keep CUDA syntax out of the shared library interface.

$ cat library.cuh
#ifndef LIB1_H_INCLUDED
#define LIB1_H_INCLUDED

void print_value ( int x );
void cuda_hello();

#endif /* LIB1_H_INCLUDED */
$ cat library.cu
#include "library.cuh"
#include <stdio.h>

void print_value( int x )
{
    printf("%d\n", x);
}

__global__ void cuda_hello_kernel(){
    printf("Hello World from GPU!\n");
}

void cuda_hello(){
     cuda_hello_kernel<<<1,1>>>();
     cudaDeviceSynchronize();
}

$ cat main.cu
#include <stdio.h>
#include "library.cuh"

int main ( void )
{
    print_value(10);
    cuda_hello();
    return 0;
}
$ nvcc -Xcompiler -fPIC -o liblibrary.so --shared library.cu
$ nvcc -I. main.cu -L. -llibrary
$ LD_LIBRARY_PATH=. cuda-memcheck ./a.out
========= CUDA-MEMCHECK
10
Hello World from GPU!
========= ERROR SUMMARY: 0 errors
$

Daniel_Wong · June 2, 2021, 2:52am

Hi @Robert_Crovella Thanks a lot!
Just have one more follow up question.

when should we use -rdc=true during the compilation?

Robert_Crovella · June 2, 2021, 1:20pm

when device code in one compilation unit is referenceing or calling device code in another compilation unit. You have no examples of that here.

Daniel_Wong · June 2, 2021, 3:50pm

Does this apply to the case where I have the definition of a __device__ function in one .cu file and compile it to .so, then link this .so with another .cu file in the later compilation?

Robert_Crovella · June 2, 2021, 3:59pm

No, that won’t work, generally. It is a stated limitation that you cannot device-link across a .so boundary. You can device link across a static library boundary.

You can device link within a .so, just not across the boundary/interface.

Daniel_Wong · June 2, 2021, 5:31pm

Got it, Now, I understand why CUDA-based libraries such as cuDNN, cuBLAS, NCCL, only offer host CPU API instead of the __global__ function API.

However, I found there is one exception is for NVSHMEM, they also offer device-level API for a thread/warp/block that can directly be called from a __global__ kernel, which is quite different from those above CUDA libraries.

Robert_Crovella · June 2, 2021, 5:37pm

nvshmem is implemented via a static library. As we’ve already discussed, you can do more-or-less anything with a static library.

Topic		Replies	Views
general use of nvcc CUDA Programming and Performance	8	2681	November 7, 2009
CUDA/C++ Compiler issue Call a CUDA function in a C++ main CUDA Programming and Performance	6	10359	September 30, 2009
Compiling CUDA Program Compiling a program with NVCC CUDA Programming and Performance	3	2485	October 5, 2009
Compile .cu like .cpp CUDA Programming and Performance	7	3171	October 14, 2016
Nvcc and nvlink error CUDA Programming and Performance cuda , nvcc	4	799	November 29, 2023
Dynamically loading an OpenACC-enabled shared library from an executable compiled with nvc++ does not work nvc, nvc++ and nvfortran	5	875	April 13, 2022
unresolved external symbol _main referenced in function ___tmainCRTStartup CUDA Programming and Performance	7	9316	February 22, 2011
NVCC forces c++ compilation of .cu files CUDA Programming and Performance	11	25696	December 11, 2011
validation of CUDA installation Jetson AGX Xavier	6	928	October 18, 2021
Build Error MSB3721 When calling object method within kernel, using compiler directives CUDA Programming and Performance	9	5729	November 18, 2015

NVCC Compile Shared Library

Related topics