host_data fails for dynamically loaded library

Hi,

I have some Openacc C++ code that calls a thrust function (compiled with nvcc) to sort an array of floats. Within the C++ code I call it with


void* stream = acc_get_cuda_stream(acc_async_sync);

#pragma acc data present(positions)
{
#pragma acc host_data use_device(positions)
{
    sort( positions, N, stream );
}
}

When I compile this into a binary it works fine, and I checked the address of the array positions changes from the first block to the host_data block.
However when I compile this into a dynamically loaded library I get an error:

terminate called after throwing an instance of ‘thrust::system::system_error’
what(): CUDA free failed: cudaErrorIllegalAddress: an illegal memory access was encountered

And I checked the address in the host_data block is the same as in the block before, so it seems the address doesnt get translated correctly. Is there a compile option to make this work?
I use “nvcc -Xcompiler -fPIC” to compile the thrust code.

Kind regards,
Rob

Hi Rob,

How are you linking the DLL? Are you using a DLLmain?

In order to interoperate with CUDA, the OpenACC runtime needs to be initialized a bit differently. During a normal link, we add a call to the binary’s init section which calls the appropriate initialization routine. For a DLL, there’s no init section, so no initialization. Instead, you need to manually call the init routines from a DLLmain.

Below is the generic DLLmain that I use and comment or uncomment out the appropriate init call depending on the code. While I haven’t tried your particular case, I’m thinking you only need to call the “__pgi_uacc_set_link_cuda” routine, thought you might need “__pgi_acc_preinit” as well.

Note that this is a C++ file due to the inclusion of “widows.h” that I compile with MSVC++.

#include <windows.h>
#include <stdio.h>
#include <stdlib.h>

extern "C"
{
void __setchk(long*,size_t,size_t);
void _mp_preinit(void);
void __pgi_acc_preinit(void);
void __pgi_uacc_set_link_multicore(void);
void __pgi_uacc_set_link_cuda(void);
void __pgi_ctrl_init();
}
BOOL WINAPI DllMain(
HINSTANCE hinstDLL, // handle to DLL module
DWORD fdwReason, // reason for calling function
LPVOID lpReserved ) // reserved
{
// Perform actions based on the reason for calling.
switch( fdwReason ) 
{ 
case DLL_PROCESS_ATTACH:
  long n;

//  printf("Calling setchk\n");
//  __setchk(&n+256+128*1024,0,0);
// printf("Calling acc_preinit\n");
// __pgi_acc_preinit();
//  printf("Calling mp_preinit\n");
 //  _mp_preinit();
//  __pgi_uacc_set_link_multicore();
//  __pgi_ctrl_init();
  printf("Calling set link cuda\n");
  __pgi_uacc_set_link_cuda();
break;

case DLL_THREAD_ATTACH:
break;

case DLL_THREAD_DETACH:
break;

case DLL_PROCESS_DETACH:
break;
}

return TRUE; // Successful DLL_PROCESS_ATTACH.
}

-Mat

Hi Mat,

thanks for the quick help! I compile on linux, “__pgi_uacc_set_link_cuda” did the trick.
Is there a documentation for these functions?

Kind regards,
Rob

Is there a documentation for these functions?

No, sorry, these are internal runtime routines that aren’t documented. The only time I’ve really ever needed to expose them is from a Windows DLL since these have their own process so can’t be initialized from the main program. (I miss read your title to be dynamically linked library, i.e. DLL, not loaded.)

From Linux, the recommendation is to link with PGI so the appropriate link time objects and libraries are implicitly included.

Otherwise, you’ll need to add the “-dryrun” flag to dummy pgc++ link, with the appropriate flags, to get all the needed libraries and objects. The exact libraries and objects can change from release to release. In this case, the initialize object you should try adding to your link is “$PGI/linux86-64-llvm/19.10/lib/acc_init_link_cuda.o”, which gets included in the init section of the binary and calls the “__pgi_uacc_set_link_cuda” routine.

-Mat