Memory allocation and deallocation inside a for loop

Hi,
this is my problem. I have a for loop in the host code. Inside it I have to work on two memory structures whiose size change at each loop iteration. This size is computed inside the loop, therefore I cannot allocate data structure before this computation. My code work as follows:

for(int i=0;i<iter;i++)
{

cudaMalloc((void**)&dA,9sizelzsizeof(float)); // sizelz and L change at each iteration
cudaMalloc((void**)&dx,L*sizeof(float));

culaDeviceSgels(‘N’, sizelz, 9, 1, dA, sizelz, dx, max(sizelz,9));
cudaMemcpyToSymbol(dcoeff, dx, 9*sizeof(float),0, cudaMemcpyDeviceToDevice);

/…other operations…/
cudaThreadSynchronize() ;
cudaFree(dx);
cudaFree(dA);
}

The problem is that sometimes when I launch the program many times I get an Out of memory errror, or Cuda unknown error, therefore I think that there is some memory leakage in my code.
I would try to avoid this dynamic allocation allocating dA and dx outside the loop using the maximum size that they can achieve, but if I do this, how can I use only the first sizelz elements of my buffers? I mean, I would call my cula routing using only the first sizelz columns of dA and the first sizelz elements of dx . How Can I do this? Any suggestions?

Thanks in advance

What is culaDeviceSgels( … ) ? A wrapper for your kernel call?

I recommend you check the returned values of the CUDA function calls. For example:

void Check_CUDA_Error(const char *message)

{

	cudaError_t error = cudaGetLastError();

	if(error!=cudaSuccess)

 	printf("ERROR: %s: %s\n", message, cudaGetErrorString(error) );

}

...

cudaMalloc((void**)&dA,9*sizelz*sizeof(float)); // sizelz and L change at each iteration

Check_CUDA_Error("cudaMalloc, dA");

...

Said that, you can check the current size limit to read/write data from/to the buffers into your kernel. Could you post a reduce example of your kernel and how you manage the indexes for access data?