Freeing Pinned Memory During Kernel Execution

I have some code that makes use of concurrent kernels on devices with second generation SMs. I am trying to free a region of pinned memory on the CPU while another kernel is executing on the device, however I keep getting the following error when calling ‘cudaFreeHost’: “the launch timed out and was terminated.” Is there a reason that we are unable to free pinned memory while a kernel is executing on the device even though the executing kernel doesn’t touch the pinned memory? This is significantly degrading my performance as it forces me to serialize memory management operations with execution. In order to get around this I’m simply allocating a large region of pinned memory at start up and having my runtime do its own memory management, thereby forcing my runtime to pin a large number of pages for long periods of time unnecessarily. This is just a temporary fix and will ultimately result in additional performance losses with large data sets that will no longer fit in my CPU memory because of all the pinned pages.

Below is an example program that when run on CUDA 3.1 on a Linux platform will illustrate the problem of being unable to free pinned memory during kernel execution.




#include “cuda.h”

#include “cuda_runtime.h”


void doNothing(int *finished)


    while ((*((volatile int*)(finished))) == 0) { }



int main()


    int *finished_h, *finished_d;

    cudaError_t err = cudaHostAlloc((void**)&finished_h, sizeof(int), cudaHostAllocMapped);

    assert(err == cudaSuccess);

    err = cudaHostGetDevicePointer((void**)&finished_d,finished_h,0);

    assert(err == cudaSuccess);

(*finished_h) = 0;

// Allocate some pinned memory

    int *ptr;

    err = cudaMallocHost((void**)&ptr, sizeof(int));

    assert(err == cudaSuccess);

// Launch the kernel


// Try to free the pinned memory

    err = cudaFreeHost(ptr);

    if (err != cudaSuccess)


            fprintf(stderr,"Unable to free pinned memory!\n%s\n", cudaGetErrorString(err));



// Tell the kernel that we are done

    (*finished_h) = 1;

err = cudaThreadSynchronize();

    assert(err == cudaSuccess);



return 0;