How to perform eigenvalue decompositions for large matrices with Python

s769.098211 · May 17, 2024, 12:31pm

I have dense, symmetric matrices of the size ~5e4x5e4 to 1e5x1e5 that I want to compute the eigenvalues of. I have looked at CuPy (cupy.linalg.eigvalsh), but I run out of memory on a single GPU when using this. I know cuSOLVER has a Multi-GPU extension; are there any Python libraries that wrap it? Or is there some other way to go about this?

njuffa · May 17, 2024, 12:58pm

Rent the hardware from one of the big cloud providers, maybe? For example, AWS appears to offer instances with A100 and H100 acceleration. I don’t know what the pricing looks like. Since these instances are probably popular with the AI crowd, I would imagine fairly stiff pricing.

Curefab · May 17, 2024, 6:04pm

Running out of memory is always bad for linear algebra. Are the matrices sparse or banded?
You probably want to calculate the matrices with double precision?
You have to find a suitable algorithm for your matrix type and numerical stability requirements.

You would have to find a way to calculate the constituent operations on multiple GPUs in a shared way.

Can you accept the longer execution time due to more memory transfers?

Workstation graphics cards would support NVLink, which can be faster than PCIe.

s769.098211 · May 17, 2024, 9:10pm

I have access to some A100/H100 GPUs through my university. I was initially trying to work on an A100 (40GB) GPU. The array size (52500x52500) is ~22GB. CuPy wants to allocate twice this much (presumably to store the eigenvectors internally), which was too much for that GPU. I then tried an A100 (80GB) GPU, which was able to allocate the memory, but then ran into a CUSOLVER_STATUS_INVALID_VALUE error. I think this is due to the total number of array elements being greater than the max int value. Does CuPy have the 64-bit API for cuSOLVER?

Curefab · May 17, 2024, 9:29pm

Can you confirm that smaller matrix sizes work?

Perhaps you could try to run one of the C++/Cuda symmetric matrix examples with some modifications for your Matrix size and batch size 1 to see, whether it is a cuSolver or a CuPy limitation:

You would also better see, how much memory really has to be allocated.

s769.098211 · May 17, 2024, 9:48pm

Yes, I can confirm that smaller matrix sizes work. I also saw this issue cusolver error: CUSOLVER_STATUS_INVALID_VALUE for large matrices when trying `linalg.eigh` · Issue #7261 · cupy/cupy · GitHub which seems to be the same as what I am getting.

I will also try out the pytorch solver to see if it supports large array sizes.

Curefab · May 17, 2024, 10:51pm

The cuSolver has a legacy 32-bit and a newer 64-bit API (since Cuda 11.0?).

In the 32-bit version of the API, helper functions returned the needed buffer size as number of doubles (or whatever type you use) as int* lWork output parameter; the actual computation functions use an int value as input parameter for the buffer size.

The 64-bit API uses size_t parameters.

You either can directly access the 64-bit API (with C++) or need a Python library, which uses the newer 64-bit API (if there is any).

s769.098211 · May 17, 2024, 11:13pm

I believe pytorch uses the 64-bit API. I tried this, and I don’t get a cuSOLVER error, but I get an out of memory error. It says it tried to allocate ~66GB after already allocating ~44GB. I don’t know why it needs this much space since the array is only 22GB.

I think I’ll have to resort to using C++ and directly interfacing with the cuSOLVER API for this at the end of the day.

Curefab · May 17, 2024, 11:18pm

There also was once (10.1) a bug, that a quadratic buffer size requirement was returned, but should have been fixed by now (11.0).

sgesvd_bufferSize int32 overflow with CUDA 10.1 · Issue #2351 · cupy/cupy · GitHub (that issue describes both problems)

Curefab · May 17, 2024, 11:51pm

See pytorch/aten/src/ATen/native/cuda/linalg/CUDASolver.cpp at main · pytorch/pytorch · GitHub

Or Add cusolver potrf and potrfBatched to the backend of torch.cholesky decomposition by xwang233 · Pull Request #53104 · pytorch/pytorch · GitHub , where the 64-bit API was mentioned and introduced.

Topic		Replies	Views
Python: Use thread indexing as well as linear algebra stuff CUDA Programming and Performance	7	534	October 30, 2020
Parallel large SVD GPU-Accelerated Libraries cusolver	5	1172	October 20, 2023
Wrong output of Matrix Inversion using cuSOLVER GPU-Accelerated Libraries cuda	7	115	July 22, 2024
Cusolver SVD for general complex matrix GPU-Accelerated Libraries cusolver	4	2138	October 12, 2021
Help Improving Performance using cuSolver/cuSparse Routines GPU-Accelerated Libraries cuda , nsight , performance , python , pycuda	0	704	December 15, 2023
cuSOLVER solving a overdetermined linear equation system GPU-Accelerated Libraries cusolver	8	1272	September 19, 2021
cusparseScsrilu02 breaks with large matrices GPU-Accelerated Libraries cublas , cusolver , cusparse	10	1158	June 1, 2022
problem with eigenvalue decomposition of hermitian matrix with cusolverDnDsyevd GPU-Accelerated Libraries	1	1326	September 22, 2017
Problem of two large sparse matrices multiplication in cuParse CUDA Programming and Performance	6	3710	November 21, 2016
How to run eigen decomposition of multiple Hermitian matrices in parallel? GPU-Accelerated Libraries cusolver	0	1142	March 4, 2022

How to perform eigenvalue decompositions for large matrices with Python

Related topics