"nvlink error : Undefined reference to ...", when using OpenACC to offload NTL code to the GPU

Hello, we are two undergrad students currently working on the parallelization of a program that’s performing calculations that involve searching through large ranges of integers, using nvc++ and OpenACC. We are using the Number Theory Library (NTL).

Our current workflow is compiling the library with g++ and statically linking it with our program, which we then compile using nvc++ and OpenACC.
After modifying the NTL header files with the necessary ‘#acc declare’ and ‘#acc routine seq’ directives, we encountered several “nvlink error: Undefined reference to …” errors, which appear to be related to the compiler not finding the necessary definitions of the NTL functions used in our code that are declared in the NTL header files.

As a demonstration of our problem, we have attached a small code example (test.c), a compile.sh to compile our example, and the build.sh we’re using to build NTL. When compiling this example, we encountered the following errors, which are representative of the errors we get when compiling our main program:

nvlink error : Undefined reference to ‘_ZN3NTL6randomERNS_5ZZ_pXEl’ in ‘/tmp/nvc++zSofl4tYMlby.o’
nvlink error : Undefined reference to ‘_ZN3NTL12BlockDestroyEPNS_4ZZ_pEl’ in ‘/tmp/nvc++zSofl4tYMlby.o’
nvlink error : Undefined reference to ‘_ZN3NTL9ZZ_pEInfoE’ in ‘/tmp/nvc++zSofl4tYMlby.o’

We were able to trace the _ZN3NTL9ZZ_pEInfoE error back to line 36 of ZZ_pE.h (it’s probably related to the missing definition of ZZ_pEInfo due to the use of “extern”). The _ZN3NTL12BlockDestroyEPNS_4ZZ_pEl error is probably related to the BlockDestroy-method in ZZ_p.h, line 553, whose definition in vec_ZZ_p.cpp cannot be found by the compiler.

Since we’re still new to OpenACC, we are unable to fix these errors without creating new ones. Are these problems related to the fact the we’re using different compilers for building the library and compiling our program? Or does the problem lie somewhere else? Thanks for your help in advance!

Small Example.zip (890 Bytes)

Hi praxster,

Correct. Using device routines has two parts. The caller needs to know that there is a device version available (which is what you did when adding “acc routine” to the headers), but the callee needs to be compiled so a device version is created. Skipping this would result in undefined references since there’s no device code for these routines to link.

You’ll need to build NTL, at least the parts you want to use, with nvc++ and OpenACC enabled.

Now I have no idea how NTL is implemented, but be careful with RNGs. Often they have global state which if parallelized, can cause race conditions. Typically you’d need an RNG where each thread maintains it’s own state. If NTL isn’t parallelizable, this one may be a good alternative: Pseudo Random Number Generation by Lightweight Threads | OpenACC

Hope this helps,

Hey Mat,

thanks for the speedy response, we’ll see to take your advice and build NTL with nvc++.
It helped us a lot.

Take care,

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.