__pgi_gangidx() doesn't work in DLL

I tried using the __pgi_gangidx() function within a parallel loop, it works fine until I use it with “-ta=multicore” within a dynamically loaded library. In a regular binary it works fine, with “-ta=tesla” it also works fine in the DLL.
Is there a special linker flag one has to use? Does gangidx() under the hood use openmp functionality?

Kind regards,
Rob

Hi Rob,

Does gangidx() under the hood use openmp functionality?

In general, the OpenACC targeting multicore does use much of the OpenMP runtime. I personally don’t use gangidx often since it’s a PGI extension, but it should correspond to omp_get_thread_num.

Is there a special linker flag one has to use?

I haven’t tried using gangidx specifically in a DLL, but I have had other issues when not initializing the PGI runtime from DLLmain.

While it’s been a few years since I’ve done any work with DLLs, I use the following DLLmain, commenting or uncommenting out the different init calls depending on which target I’m using and if I’m linking with CUDA.

#include <windows.h>
#include <stdio.h>
#include <stdlib.h>

extern "C"
{
void _mp_preinit(void);
void __pgi_acc_preinit(void);
void __pgi_uacc_set_link_multicore(void);
void __pgi_uacc_set_link_cuda(void);
}
BOOL WINAPI DllMain(
HINSTANCE hinstDLL, // handle to DLL module
DWORD fdwReason, // reason for calling function
LPVOID lpReserved ) // reserved
{
// Perform actions based on the reason for calling.
switch( fdwReason ) 
{ 
case DLL_PROCESS_ATTACH:
  long n;
// OpenACC targeting GPUs
//  __pgi_acc_preinit();

// Iniit OpenMP runtime
  _mp_preinit();
  
// Initialize OpenACC multicore runtime
   __pgi_uacc_set_link_multicore();
   
// If linking with CUDA
//  __pgi_uacc_set_link_cuda();
break;

case DLL_THREAD_ATTACH:
break;

case DLL_THREAD_DETACH:
break;

case DLL_PROCESS_DETACH:
break;
}

return TRUE; // Successful DLL_PROCESS_ATTACH.
}

-Mat