Simple program won't exit if cudaMalloc is called.

The following simple program never exits if the cudaMalloc call is executed. Commenting out the cudaMalloc and cudaFree causes it to exit normally.

#include <iostream>

using std::cout;

using std::cin;

#include "cuda.h"

#include "cutil_inline.h"

void PrintCudaVersion(int version, const char *name)


    int versionMaj = version / 1000;

    int versionMin = (version - (versionMaj * 1000)) / 10;

    cout << "CUDA " << name << " version: " << versionMaj << "." << versionMin << "\n";


void ReportCudaVersions()


    int version = 0;


    PrintCudaVersion(version, "Driver");


    PrintCudaVersion(version, "Runtime");


int main(int argc, char **argv)



void *ptr = NULL;

    cudaError_t err = cudaSuccess;

    err = cudaMalloc(&ptr, 1024*1024);

    cout << "cudaMalloc returned: " << err << "  ptr: " << ptr << "\n";

    err = cudaFree(ptr);

    cout << "cudaFree returned: " << err << "\n";



This is running on Windows 7, CUDA 4.1 driver, CUDA 3.2 runtime. I’ve trace the return from main through the CRT to ExitProcess(), from which it never returns (as expected) but the process never ends either. From VS2008 I can stop debugging OK. From the command line, I must kill the console window.

Program output:

Init result: 0

CUDA Driver version: 4.1

CUDA Runtime version: 3.2

cudaMalloc returned: 0  ptr: 00210000

cudaFree returned: 0

I tried making the allocation amount so large that cudaMalloc would fail. It did and reported an error, but the program still would not exit. So it apparently has to do with merely calling cudaMalloc, not the existence of allocated memory.

Any ideas as to what is going on here?

After reverting the CUDA driver to version 3.2, the problem is fixed. I’ve been under the impression that the driver is backward compatible, but apparently not in this case.