How to make a shared library?


For using the existed cuda fotran code in Python, I tried to compile (v16.10) them as shared library on linux (CentOS 6.8).
But I got a little trouble while compiling the simple example in step 2:

/usr/bin/ld: /tmp/pgcudaregDucexwm0fdvc.o: relocation R_X86_64_PC32 against undefined symbol `__pgi_cuda_register_fat_binary’ can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
pgacclnk: child process exit status 1: /usr/bin/ld

Are there any suggestions ?
Thank you in advance.

Compiling steps:
pgf90 -c -Mcuda -fpic cuda_so.cuf
pgf90 -o -Mcuda -shared cuda_so.o


module cuda_so_f90
use iso_c_binding
use cudafor
implicit none
  character(len=30, kind=c_char), device :: d_a, d_b

  subroutine test_so( ha, hb) bind(c, name="test_so")
  implicit none
    character(len=30, kind=c_char) :: ha, hb

    print*, 'hb into test_so :', hb
    print*, 'hb leaving test_so :', hb

  end subroutine test_so
end module cuda_so_f90

Hi cyfengMIT,

For shared objects and DLLs, you’ll need to compile without relocatable device code (RDC) enabled by adding the “-Mcuda=nordc” flag. RDC requires a device linker which is not available for dynamically loaded objects.

% pgf90 -c -Mcuda=nordc -fpic cuda_so.cuf
% pgf90 -o -Mcuda=nordc -shared cuda_so.o

Note that without RDC, you will not be able to call device routines located in external modules, or access device data from external modules. Both require a device linker.

Hope this helps,

Hi Mat,

It works! Thanks for your suggestion.
And I want to make sure that does it mean I can only put all the device related data, subroutines and functions into the module cuda_so_f90 for making shared objects and DLLs ?
Are there any concerns beyond this design ?


Hi CY,

I should clarify. You can’t access device data found in external modules directly. (i.e you’ve used the OpenACC “declare” directive on the external module data and/or access the device data by using the module in the kernel). If you pass the external device data as an argument to the device kernel, then it’s fine. If “cuda_so.cuf” is self contained, then you shouldn’t have any issues.

The problem is that since there’s no device side dynamic linker, external references in device code can’t be resolved.


Hi Mat,

I got it. Thank you very much.