Use of driver API in DLLs causes a hang on exit on some configurations

vaivaswatha · May 3, 2017, 5:05am

Hi,

I’m facing a strange problem where a CUDA program, that uses driver APIs, when built as a DLL, causes an executable that links to the DLL to hang after it exits main(). The process then has to be killed from the task manager.

The same code when built as an executable works fine. Also, this problem occurs only on certain configurations (details later in the post).

DLL.cpp: This file is compiled and built into a DLL.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define CUDA_SAFE_CALL(x)                                         \
  do {                                                            \
    CUresult result = x;                                          \
    if (result != CUDA_SUCCESS) {                                 \
      const char *msg;                                            \
      cuGetErrorName(result, &msg);                               \
      fprintf(stderr, "CUDA: %s\n", msg);                         \
    }                                                             \
   } while(0)


// For the CUDA runtime routines (prefixed with "cuda_")
#include <cuda_runtime.h>
#include <cuda.h>

__declspec(dllexport) int __cdecl
mainEntry(void)
{
    cuInit(0);
    // Error code to check return values for CUDA calls
    cudaError_t err = cudaSuccess;

    // Print the vector length to be used, and compute its size
    int numElements = 50000;
    size_t size = numElements * sizeof(float);

    // Allocate the device input vector A
    float *d_A = NULL;
    err = cudaMalloc((void **)&d_A, size);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to allocate device vector A (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Free device global memory
    //err = cudaFree(d_A);
    CUDA_SAFE_CALL(cuMemFree((CUdeviceptr)d_A));

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to free device vector A (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    printf("Hello world\n");
    return 0;
}

main.cpp: This file is built as an executable that uses the DLL above.

int mainEntry(void);

int main()
{
    mainEntry();
}

The executable hangs at exit (i.e., after printing “Hello world”) on the following configuration:
Windows 7 Professional
NVIDIA driver 369.3
CUDA Toolkit 8.0
GeForce GTX 980ti / GTX 960 / GTX 560Ti

This looks to be a CUDA driver bug and any workaround suggestions would be very helpful.

Thanks.

Vaivaswatha.

Robert_Crovella · May 3, 2017, 1:37pm

Update to CUDA 8.0.61 and a driver that is appropriate for 8.0.61. 369.3 is no longer a correct driver to use with CUDA 8, depending on the version of CUDA 8.

Why are you mixing CUDA runtime API functions:

err = cudaMalloc((void **)&d_A, size);

with CUDA driver API functions:

CUDA_SAFE_CALL(cuMemFree((CUdeviceptr)d_A));

You are not meeting the proper requirements for runtime API/driver API interoperability as defined here:

[url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#interoperability-between-runtime-and-driver-apis[/url]

vaivaswatha · May 3, 2017, 2:12pm

Thank you for the response. I’ll try the driver update.

What exactly in the interoperability requirement am I missing?

I have a call to cudaMalloc() before the driver API usage which will setup a Context
([url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#initialization[/url]).

Since there now is a default context setup, using the driver API cuMemFree should work.

One aspect I just realised could be related to the requirement of pushing and popping contexts from library code.
[url]Programming Guide :: CUDA Toolkit Documentation

Could that be it? Anyway I’ll give that a try.

Robert_Crovella · May 3, 2017, 2:33pm

A normal driver API code will establish a context and make it current, before invoking further driver API functions that are intended to refer to that context.

In the interoperability section, I read the following:

“If the runtime is initialized (implicitly as mentioned in CUDA C Runtime), cuCtxGetCurrent() can be used to retrieve the context created during initialization. This context can be used by subsequent driver API calls.”

So in this case, since you are implicitly establishing a context via the runtime API (cuInit does not establish a context, cudaMalloc does, implicitly), I would expect to see the retrieval of that context and making it current for the driver API, before invoking driver API calls that refer to that context. The example given in the quoted text is the use of cuCtxGetCurrent(). I don’t see that call, or any driver API context calls, in your posted code.

vaivaswatha · May 3, 2017, 2:42pm

I do not call cuCtxGetCurrent() because I did not have the need to specify the context to the driver API calls I made. In other words, I am okay with the default context that is already set by the runtime API.

Do you mean to say that I must call cuCtxGetCurrent() and use the result to set cuCtxSetCurrent()? The driver API is expected to just use the current context automatically right, as long as it has been created.

vaivaswatha · May 4, 2017, 3:39am

Also, In my code above, even if I just use cudaFree instead of cuMemFree, just the presence of cuInit() is sufficient to cause the hang.

vaivaswatha · May 4, 2017, 4:42am

It looks like I have a solution to the problem.

Based on this [url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/#context[/url] I created a new context and pushed it on the stack before executing GPU code in the DLL (and later popped it back after the work was done). That fixed it for me.

Thank you @txbob.

Topic		Replies	Views
How to write an application that optionally uses CUDA? without actually requiring any CUDA related CUDA Programming and Performance	16	19304	March 25, 2009
Strange cudaErrors popping up during dll loading. CUDA Programming and Performance	5	12494	October 20, 2011
CUDA plugin dlopen CUDA Programming and Performance	6	4619	September 24, 2010
Using the dynlink API CUDA Programming and Performance	2	4753	September 14, 2011
Recoving after a TDR event CUDA Programming and Performance	14	2140	April 20, 2016
Delay load cuda runtime dll (cudart.dll) CUDA Programming and Performance	8	12401	May 28, 2009
cuda.h error message CUDA Programming and Performance	9	6034	October 22, 2009
Multiple GPUs host thread safety? CUDA Programming and Performance	6	14075	July 15, 2010
Export C DLL library using Cuda Export C DLL CUDA Programming and Performance	3	6365	April 6, 2007
Undefined reference to a CUDA function in a dll (MSVC 2017 + MingW) CUDA Programming and Performance	10	1925	May 18, 2018

Use of driver API in DLLs causes a hang on exit on some configurations

Related topics