Passing on device pointers between different programs

I’m trying to make mex-files in Matlab so that I can copy data to and from the GPU easy from Matlab

The first part seems to work, I get a device pointer to Matlab

cutilSafeCall( cudaMalloc((void **)&d_Data, DATA_SIZE) );

// Copy the data to the graphics card

cutilSafeCall( cudaMemcpy(d_Data, h_Data, DATA_SIZE, cudaMemcpyHostToDevice) );

// Send the pointer to Matlab

int ndim = 1;

const int dims[1] = {1};

plhs[0]= mxCreateNumericArray(ndim, dims, mxUINT32_CLASS, mxREAL);

memcpy((unsigned char *)mxGetPr(plhs[0]), (unsigned char *)(d_Data),

mxGetElementSize(plhs[0]));
d_Data= (float *)((unsigned int)mxGetScalar(prhs[0])); 

// Allocate data for Matlab

plhs[0] = mxCreateNumericArray(NUMBER_OF_DIMENSIONS, ARRAY_DIMENSIONS ,mxSINGLE_CLASS, mxREAL);

h_Data = (float*)mxGetData(plhs[0]);

// Copy the data from the graphics card

cutilSafeCall( cudaMemcpy(h_Data, d_Data, DATA_SIZE, cudaMemcpyDeviceToHost) );

However when I try to get the data back I always get “invalid device pointer”, is it impossible to do what I want to do?

If anyone is interested, I solved the problem now. The “cudaThreadExit()” call seemed to make the device pointer invalid, it works when I removed it.