I noticed that cusolverDnSgetrf_bufferSize, cusolverDnDgetrf_bufferSize, cusolverDnCgetrf_bufferSize, and cusolverDnZgetrf_bufferSize all take the matrix being factored as an argument. So, for example:
cusolverStatus_t
cusolverDnZgetrf_bufferSize(cusolverDnHandle_t handle,
int m,
int n,
cuDoubleComplex *A,
int lda,
int *Lwork );
The pointer to the complex matrix A is one of the arguments. This means that I can only know the size of the needed workspace after I have allocated the complex matrix A!
What I want to do is calculate ahead of time if I have enough GPU memory to hold the complex matrix A and the workspace.
Am I misunderstanding the use of cusolverDnZgetrf? How can I do what I want to do?
The amount of GPU memory needed to hold the A matrix itself is just mnsizeof(cuDoubleComplex)
This function is used to calculate the workspace size. This is extra space (besides that allocated for the A matrix) that is needed by the function to perform its work.
The size of this space may depend on the structure of the A matrix, and so you must provide the complete A matrix before the buffer size can be calculated.
Thanks for the reply, Robert.
Your answer makes sense for the sparse case, but since the matrix A is dense, then the structure of A is already known. In other words, cusolverDnZgetrf_buffersize should know the structure of A without me having to pass it the actual matrix A.
By “the structure of A” I am referring to characteristics of A which may affect the algorithm choices/path. Not the sparsity pattern.
Anyway, that is the behavior of the API. There is no alternate method provided. If you’d like to see a change in any CUDA behavior, you should file a bug.
Thanks, Robert, I submitted a bug report.