memory alloc error for Cholmod factorization in cuSolver library.

I’m trying to use cusolverSpDcsrlsvchol() to solve a matrix with 1m*1m size and 5m non-zero. However, the function returns CUSOLVER_STATUS_ALLOC_FAILED. It means the memory allocation is failed in that function.
I wonder if there’s a way to solve this problem. Or is there a way to know how much more memory I need for this function?

The GPU I use is Tesla V100 with 32G memory.

The error you are hitting may be indicative of a limitation in the algorithm, when there is a large zero fill-in on the G matrix. Having more memory would not allow you to work around this limitation.

Referring to the documentation:

“Remark 2: the function only works on 32-bit index, if matrix G has large zero fill-in such that number of nonzeros is bigger than 2 31 , then CUSOLVER_STATUS_ALLOC_FAILED is returned.”

The definition of zero fill-in:

You may wish to read that section, it gives more than just the definition of zero fill-in.

I’ve hit the same error when using Titan V with 12gb ram.

In my case the matrix is 523797 x 523797 and the number of nonzeros is 21648079, which is orders of magnitude below the 2^31 limit.

For comparison, a smaller problem with 155016 x 155016 matrix and 6653340 nonzeros runs fine.

So the number of nonzeros in both cases is <<2^31 and the number of zeros is >>2^31. What else may be going wrong?

zero fill-in could possibly be an issue, as already mentioned in this thread.

Well, in that case I would strongly encourage your CUDA development team to implement the cusolver version with 64-bit integers, because current version is pretty useless for solving any problems that take more than a few seconds.

Consider filing an enhancement request via the bug reporting pade. Prefix the synopsis with “RFE:” to mark it as an enhancement request.

Filing an enhancement request doesn’t guarantee that a desired feature will materialize any time soon, but without filing of an enhancement request the likelihood is high that it will never come into existence.

Agree, the enhancement request is a good way to get your voice heard.

Also, the limitation on 32-bit integers for double quantities translates to a limit of 16GB for the temporary storage (really, potentially larger than that, as I am only considering the non-zero values here, no other arrays such as indexing). 64-bit integers wouldn’t dramatically change the possible sizes, even on the largest of GPUs, currently.

On your 12GB titan, if the zero fill-in is hitting the ceiling, the ceiling it is hitting is your memory size, not the 32-bit integer limitation.

Since there are no provided examples in this thread, it is impossible to know if zero fill-in is actually the problem (or what the problem actually is), as that depends on matrix structure, which cannot be inferred from simple descriptions like dimensions or the number of non-zeros.

Yes, I’ll talk to the people that I’ve worked with from the cuda development team and will file the enhancement request.

The Titan card is not the only one we’ve got, we also bought some Tesla V100 with 32gb each, so waiving the 32-bit limit would make sense there.

Also, I think I can share the model that I used for benchmarking, it’s about 400mb in MatrixMarket format and only about 105mb when zipped.