undefined reference to `__pgi_uacc_computestart'

Hello,

I am attempting to compile a code that uses C, Fortran, MPI, OpenACC, gcc and pgf90, and get the error mentioned in the subject. For testing I wrote a toy version with three files:

  • add_n.c (MPI,C) - need to compile for the moment with gcc+OpenMPI
  • add_nf.f90 [OpenACC,Fortran]
  • main.c [MPI,C] - need to compile for the moment with gcc+OpenMPI

The first two are compiled into a dynamic library, which then is loaded by main.c

Withouth OpenACC support I have no problem compiling / executing:

mpicc -c -fPIC add_n.c 
pgf90 -c -fPIC add_nf.f90    
pgf90 -shared add_nf.o add_n.o -o shared_lib_new.so   
mpicc main.c -ldl shared_lib_new.so -o main.x        

[angelv@deimos temp]$ ./main.x  
 --  Module loaded              
The result of calling the C function is 7.000000  
The result of calling the Fortran function (10000 loops) is 70000.000000
  --  Module unloaded

But if I do use OpenACC, then I get the error above:

[angelv@deimos temp]$ mpicc -c -fPIC add_n.c 
[angelv@deimos temp]$ pgf90 -c -fPIC add_nf.f90  -fast -ta=tesla:cc60,time,lineinfo -acc -Minfo=all,ccff -Mneginfo=all    
add_nf:        
     16, Loop is parallelizable    
         Accelerator kernel generated             
         Generating Tesla code               
         16, !$acc loop gang, vector(128) ! blockidx%x threadidx%x    
         17, Generating implicit reduction(+:add_nf)       
[angelv@deimos temp]$ pgf90 -shared add_nf.o add_n.o -o shared_lib_new.so                
[angelv@deimos temp]$ mpicc main.c -ldl shared_lib_new.so -o main.x             
shared_lib_new.so: undefined reference to `__pgi_uacc_computestart'           
shared_lib_new.so: undefined reference to `__pgi_uacc_downloads'  
shared_lib_new.so: undefined reference to `__pgi_uacc_computedone'                 
shared_lib_new.so: undefined reference to `__pgi_uacc_uploads'  
shared_lib_new.so: undefined reference to `test_add_nf_'  
shared_lib_new.so: undefined reference to `__pgi_uacc_enter'     
shared_lib_new.so: undefined reference to `__pgi_uacc_launch'    
shared_lib_new.so: undefined reference to `__pgi_uacc_noversion' 
collect2: error: ld returned 1 exit status   
[angelv@deimos temp]$

I guess I just need to add the right libraries at link time, but before I go searching I thought I might ask here before, in case someone has previous experience with a similar situation.

Many thanks,
AdV

Hi AdV,

I think in this case, all you need to do is add “-ta=tesla:cc60” when building the shared object. This way the OpenACC runtime is linked against the shared object and you don’t need to add the extra libraries during the link.

If you did want to add the libraries to the link, the easiest thing to do is look at the dryrun output from a PGI link to see what libraries the compiler is adding. For example:

% pgfortran -dryrun -ta=tesla:cc60 x.o

/proj/pgi/linux86-64/18.3/bin/pgacclnk -nvidia /proj/pgi/linux86-64/18.3/bin/pgnvd -cuda8000 -cudaroot /proj/pgi/linux86-64/2018/cuda/8.0 -computecap=60 /usr/bin/ld /usr/lib64/crt1.o /usr/lib64/crti.o /proj/pgi/linux86-64/18.3/lib/trace_init.o /usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtbegin.o /proj/pgi/linux86-64/18.3/lib/initmp.o /proj/pgi/linux86-64/18.3/lib/f90main.o --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /proj/pgi/linux86-64/18.3/lib/pgi.ld -L/proj/pgi/linux86-64/18.3/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 x.o -rpath /proj/pgi/linux86-64/18.3/lib -rpath /proj/pgi/linux86-64/2018/cuda/8.0/lib64 -rpath /usr/lib/gcc/x86_64-redhat-linux/4.8.5/…/…/…/…/lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/…/…/…/…/lib64 /proj/pgi/linux86-64/18.3/lib/acc_init_link_cuda.o -laccapi -laccg -laccn -laccg2 -ldl -lcudadevice -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lpgmp -lnuma -lpthread --start-group -lpgm -lnspgc -lpgc --end-group -lrt -lpthread -lm -lgcc -lc -lgcc -lgcc_s /usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtend.o /usr/lib64/crtn.o

Note that the exact libraries can change over time. We tend to only make changes during major releases, but they can occur during minor releases as well. This is why I recommend you look at the -dryrun output.


Note, when adding OpenACC to shared objects, you’ll need to compile with “nordc” (-ta=tesla:nordc). Relocateable device code (RDC) requires a link step in order to resolve device symbols. However since there’s not a dynamic linker for the device code, RDC needs to be disabled for shared objects. This means that code that requires linking, such as calling external device functions or accessing external global variables (by external I mean when located in separate source files), can’t be used.

Hope this helps,
Mat

Hi Mat,

thanks for your help. Your advice did actually improve things, but cannot generate the last linking step. If I compile with no OpenACC, the add_nf.o contains the following symbols related to the add_nf function:

[angelv@deimos temp]$ nm add_nf.o | grep add_nf 
0000000000000010 T add_nf        
0000000000000070 t __add_nfEND     
[angelv@deimos temp]$

but when compiled with OpenACC support then it also has the undefined symbol test_add_nf_

[angelv@deimos temp]$ nm add_nf.o | grep add_nf      0000000000000010 T add_nf        
00000000000001da t __add_nfEND     
                U test_add_nf_                                                                                                                                                                                                           
[angelv@deimos temp]$

which gives trouble at the linking stage:

shared_lib_new.so: undefined reference to `test_add_nf_'

The add_nf.f90 code is a Fortran module like this:

module test             
  USE iso_c_binding      

contains    
  function  add_nf(a,b) bind(c, name='add_nf')  
    use, intrinsic :: iso_c_binding     
    implicit none         
    real(c_double), intent(in) :: a,b        
    real(c_double) :: add_nf      
    integer :: c        

    add_nf = 0.0       

    !$acc kernels  
   do c=1,10000   
      add_nf = add_nf + (a + b)     
     end do            
   !$acc end kernels   
 end function add_nf     
end module test

Any idea what I’m missing?

Hi AdV,

This looks like a new bug in 18.1 where the CUDA kernel constructor is using the Fortran name for the function rather than the Bind C name. I’ve reported to engineering (TPR#25570).

The error does not occur when you use the “-ta=tesla:nordc” option which you’ll need to use anyway when including OpenACC code in a shared object. Can you try adding “nordc” to see if it works around the problem?

-Mat

That is great, many thanks.

I had added the nordc option only to the compiling line when I was creating the shared library, and I had the problem with the undefined symbol.

Now I have tried, and the code works fine if I add the nordc option to both the compilation of add_nf.f90 and to the creation of the shared library (but it also works OK if I only add the nordc option to the compilation of add_nf.f90, which I’m not sure if it is expected).

Many thanks for your help,
AdV

Hi,

that was only a toy code, and when attempting to compile the real code then I had troubles, because the real Fortran code does actually call another C routine, so I end up not being able to compile it:

pgf90 -c -cpp -fPIC -fast -ta=tesla:cc60,time,lineinfo,nordc -acc -Minfo=all,ccff -Mneginfo=all  math.f90
[...]
ptxas fatal   : Unresolved extern function 'Faddeeva_subw'
PGF90-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (math.f90: 1)

So, as I understand it, I cannot use nordc as it is because I use external code, and I cannot compile without nordc because I am compiling into a shared library (and even if it was not a shared library, in any case there is the bug about the names in the .o files).

Am I out of luck or there is some way out?

Thanks,
AdV

Hi AdV,

How much code restructuring can you do?

While we can’t do inlining across languages, if you can re-write the routine in Fortran and then have the device routine within the same module, then the device routine will be inlined.

-Mat

Hi Mat,

our real code is quite complex and that would involve quite a lot of work. The actual shared library that I need to build is made of C code, which calls Fortran, which calls C, though OpenACC would be used only in the last two layers (i.e. Fortran calling C). I guess the most sensible thing to do is to port the Fortran layer to C, though I was hoping there could be some other way out…

Many thanks,
AdV

Hi AdV,

TPR #25570 is resolved with 18.5. If possible, try your code with that version.

  • Alex