cudaFree invalid device pointer

I am trying to debug some code that is behaving strangly, each time the code gets executed the execution time increases. I used the cudaMemGetInfo to figure out that cudaFree is not actually free’ing any memory for the next time round. The error returned is “invalid device pointer”. I’m not sure why this is, because the pointer is declared within the same code block.

Is it because the kernel is being called 512 times and potentially, the previous kernel is not finished, so the pointer hasn’t returned to the host for free’ing?

Can someone explain why i’m getting this error, when the pointer is past to the device?

this code gets executed 512 times via a for loop in matlab

...

float *scats1D;

scats1D = (float *)mxMalloc(999*sizeof(float));

... scats1D is populated ...

...

float *deviceScats;

...

cuda_ret = cudaMalloc( (void **) &deviceScats, sizeof(float)*999);

...

cuda_ret = cudaMemcpy(deviceScats, scats1D, sizeof(float)*999, cudaMemcpyHostToDevice);

launchKernel<<<1,512>>>(deviceScats, ...., results);

... cudaMemcpy results back to host ...

cuda_ret = cudaFree(&deviceScats); // <-- invalid device pointer.

... free all other allocated memory ...

cuda_ret = cudaFree(deviceScats);