How to clear my GPU memory??

I am running a GPU code in CUDA C and Every time I run my code GPU memory utilisation increases by 300 MB. My GPU card is of 4 GB.
I have to call this CUDA function from a loop 1000 times and since my 1 iteration is consuming that much of memory, my program just core dumped after 12 Iterations. I am using cudafree for freeing my device memory after each iteration, but I got to know it doesn’t free the memory actually.
Please suggest me a way to overcome this problem and to free my GPU memory after each iteration.

Any suggestion would help. Thank you.

look carefully - you probably forgot to free some buffers allocated

What do you mean by buffers??
Now, I programmed my code such that I allocated all my device memory before the loop starts so that memory on device will be allocated only once.
I am using cusolverDnSgeqrf_bufferSize(), cusolverDnSgeqrf(), cusolverDnSormqr() and cublasStrsm() in my loop. Does any of it consumes memory and doesn’t free it??

I think the problem is in cusolverDnSgeqrf_bufferSize() which allocate memory for workspace. But I don’t think it clear out the workspace memory in every iteration.
Someone please help. If this is the problem, then please tell me how do I free the memory allocated by this function??

geqrf_bufferSize, by itself, does not allocate (or free) anything.

according to the cusolver documentation for geqrf:

http://docs.nvidia.com/cuda/cusolver/index.html#cuds-lt-t-gt-geqrf

“The user has to provide working space which is pointed by input parameter Workspace. The input parameter Lwork is size of the working space, and it is returned by geqrf_bufferSize().”

The usual method to do that would be to call geqrf_bufferSize, and then take the returned value provided in Lwork, and use that as the size parameter in a call to cudaMalloc. The allocated pointer from cudaMalloc is then passed to the actual geqrf function.

If you are doing that repetetively in a loop, without a corresponding cudaFree function, then that would be a problem. It’s impossible to say without seeing your code. If you are not doing it in the loop, its unclear why you would need to call geqrf_bufferSize in the loop (what purpose would it serve to call it but not act on its returned information?) And if you are doing this before the loop, the only way that could be sensible is if your A matrix didn’t change during the loop processing. But if that were the case, why would you need to do Q/R factorization on it repetetively in the loop?

I would suggest the information you’ve provided so far is incomplete and confusing.

Hello!!
Sorry for the late reply and less information in my question. I am pasting a part of my code here. Please tell me where am I getting it wrong.

So, In this code I think I clear all the allocated device memory by cudaFree which is only one variable. I called this loop 20 times and I found that my GPU memory is increasing after each iteration and finally it gets core dumped. All the variables which I give as an input to this function are declared outside this loop.
Also, I have to calculate this solution for different matrices(all of same sizes), but currently I am running loop with one same matrix every time just to see if my code is working fine or not. Please help me out here and let me know if you need any more information about my code.

Also, Does the work size from geqrf_bufferSize() is same for matrices having same dimension??

The first problem is that you should always use proper CUDA error checking, any time you are having trouble with a CUDA code. As a quick test, you can also run your code with cuda-memcheck (do that too.)

This is not correct:

cudaFree(&work);

It should be:

cudaFree(work);

As a result, you weren’t freeing anything, since you weren’t passing the correct pointer to cudaFree. If you had used proper CUDA error checking, you would know this already.

Not sure what proper CUDA error checking is? Google “proper CUDA error checking” and take the first hit, and start reading it, and apply it to your code. The CUSOLVER, CUBLAS and other library calls also return error codes, and your code shows no indication of checking that either.

Did I mention to do proper CUDA error checking?

You should always do proper CUDA error checking, any time you are having trouble with a CUDA code, preferably before asking others for help. It is not sensible to ignore information that the CUDA runtime is providing you to help understand your code.

Thank you very much for pointing out this error. But even after correcting this, my GPU memory is increasing after every iteration. I thin variable “work” is not a problem, it is a small array of only 13(I get it by printing work_size in output).
Also, I got your point of checking cuda error, but I am getting right answer, so I think the function is working fine.
Does cuda error check also gives information about memory usage??

You’re also creating a cublas handle:

cublasCreate(&cublas_handle);

but not destroying it (that I can see). That will chew up memory I believe.

Thanks a lot man for pointing out the error. It worked like a charm.