CUDA plugin dlopen

Hi,
I’ve written a cuda plugin (dynamic library) which is loading another application. I am using dlopen() and dlsym() to use the functions from this plugin. For my application it is very important that any time of using plugin the program get a new handle for the dynamic library (the library file will be subsequently modified).
Therefore after the using of functions from my plugin I invoke the dlclose(). The invocations dlopen() - dlsym() - dlclose() are occur during my program execution.

If I working on the computer with NVIDIA driver 256.35 (CUDA 3.0 or 3.1) I have a memory leak (I use in my plugin cudaMemGetInfo() calling for the diagnostics).
If I working on the computer with NVIDIA driver 195.36.15 (CUDA 3.0) I have the error after some time of the program execution: “NVIDIA: could not open the device file /dev/nvidia0 (Too many open files).”

If I don’t use the dlclose() invocation the program is working fine, but in this case I can’t replace the plugin on a new one’s during my program execution.

Anyone encountered this problem?
Thanks.

I’ve found the similar example on CUDA SDK: matrixMulDynlinkJIT. I’ve done small correction in the code. In particular, in the file cuda_drvapi_dynlink.c I’ve corrected cuInit() function:

CUDADRIVER CudaDrvLib = NULL;

CUresult CUDAAPI cuInit(unsigned int Flags)
{
//CUDADRIVER CudaDrvLib;
CUresult result;
int driverVer;

if (CudaDrvLib != NULL) {
  dlclose (CudaDrvLib);
  CudaDrvLib = NULL;
}
 .......

}

And in the file matrixMulDynlinkJIT.cpp I’ve added a loop in the main() function:

int main(int argc, char** argv)
{
printf("[ %s ]\n", sSDKsample);

while (1) {
   // initialize CUDA

   CUfunction matrixMul = NULL;
   cutilDrvSafeCallNoSync(initCUDA(&matrixMul, argc, argv));
   
    .....

} //while (1)
cutilExit();

}

So, I have the same problem like in my program (after some time execution): “NVIDIA: could not open the device file /dev/nvidia0 (Too many open files).”
But when I comment out the dlclose() in the cuda_drvapi_dynlink.c file – everything works fine

I can’t understand this behavior…
Any ideas?

I’ve found the similar example on CUDA SDK: matrixMulDynlinkJIT. I’ve done small correction in the code. In particular, in the file cuda_drvapi_dynlink.c I’ve corrected cuInit() function:

CUDADRIVER CudaDrvLib = NULL;

CUresult CUDAAPI cuInit(unsigned int Flags)
{
//CUDADRIVER CudaDrvLib;
CUresult result;
int driverVer;

if (CudaDrvLib != NULL) {
  dlclose (CudaDrvLib);
  CudaDrvLib = NULL;
}
 .......

}

And in the file matrixMulDynlinkJIT.cpp I’ve added a loop in the main() function:

int main(int argc, char** argv)
{
printf("[ %s ]\n", sSDKsample);

while (1) {
   // initialize CUDA

   CUfunction matrixMul = NULL;
   cutilDrvSafeCallNoSync(initCUDA(&matrixMul, argc, argv));
   
    .....

} //while (1)
cutilExit();

}

So, I have the same problem like in my program (after some time execution): “NVIDIA: could not open the device file /dev/nvidia0 (Too many open files).”
But when I comment out the dlclose() in the cuda_drvapi_dynlink.c file – everything works fine

I can’t understand this behavior…
Any ideas?

maybe the .so (DLL) should call cudaThreadExit() before unloading.

I do not currently know how to accomplish this on Linux, but on Windows you’ve got the DllMain() function which gets called when loading and unloading the DLL with DLL_PROCESS_ATTACH and DLL_PROCESS_DETACH arguments.

On Linux there should be a similar mechanism you can hook into to make sure your CUDA Context is properly disposed of.

maybe the .so (DLL) should call cudaThreadExit() before unloading.

I do not currently know how to accomplish this on Linux, but on Windows you’ve got the DllMain() function which gets called when loading and unloading the DLL with DLL_PROCESS_ATTACH and DLL_PROCESS_DETACH arguments.

On Linux there should be a similar mechanism you can hook into to make sure your CUDA Context is properly disposed of.

found some documentation on this topic:

5.2. Library constructor and destructor functions

Libraries should export initialization and cleanup routines using the gcc attribute((constructor)) and attribute((destructor)) function attributes. See the gcc info pages for information on these. Constructor routines are executed before dlopen returns (or before main() is started if the library is loaded at load time). Destructor routines are executed before dlclose returns (or after exit() or completion of main() if the library is loaded at load time). The C prototypes for these functions are:

void __attribute__ ((constructor)) my_init(void);

  void __attribute__ ((destructor)) my_fini(void);

Shared libraries must not be compiled with the gcc arguments -nostartfiles'' or -nostdlib’’. If those arguments are used, the constructor/destructor routines will not be executed (unless special measures are taken).

found some documentation on this topic:

5.2. Library constructor and destructor functions

Libraries should export initialization and cleanup routines using the gcc attribute((constructor)) and attribute((destructor)) function attributes. See the gcc info pages for information on these. Constructor routines are executed before dlopen returns (or before main() is started if the library is loaded at load time). Destructor routines are executed before dlclose returns (or after exit() or completion of main() if the library is loaded at load time). The C prototypes for these functions are:

void __attribute__ ((constructor)) my_init(void);

  void __attribute__ ((destructor)) my_fini(void);

Shared libraries must not be compiled with the gcc arguments -nostartfiles'' or -nostdlib’’. If those arguments are used, the constructor/destructor routines will not be executed (unless special measures are taken).