Hi,
this is my problem. I have a for loop in the host code. Inside it I have to work on two memory structures whiose size change at each loop iteration. This size is computed inside the loop, therefore I cannot allocate data structure before this computation. My code work as follows:
for(int i=0;i<iter;i++)
{
cudaMalloc((void**)&dA,9sizelzsizeof(float)); // sizelz and L change at each iteration
cudaMalloc((void**)&dx,L*sizeof(float));
culaDeviceSgels(‘N’, sizelz, 9, 1, dA, sizelz, dx, max(sizelz,9));
cudaMemcpyToSymbol(dcoeff, dx, 9*sizeof(float),0, cudaMemcpyDeviceToDevice);
/…other operations…/
cudaThreadSynchronize() ;
cudaFree(dx);
cudaFree(dA);
}
The problem is that sometimes when I launch the program many times I get an Out of memory errror, or Cuda unknown error, therefore I think that there is some memory leakage in my code.
I would try to avoid this dynamic allocation allocating dA and dx outside the loop using the maximum size that they can achieve, but if I do this, how can I use only the first sizelz elements of my buffers? I mean, I would call my cula routing using only the first sizelz columns of dA and the first sizelz elements of dx . How Can I do this? Any suggestions?
Thanks in advance