The Origin of the m <= 32 and n <= 32 Limitations in gesvdjBatched?

LionSRq · September 14, 2023, 3:28pm

Hello NVIDIA Community and Developers,

I’m reaching out as an indirect user of the cuSOLVER library, specifically through jax, for batched Singular Value Decomposition (SVD) operations. My work often involves the cusolverDn<t>gesvdjBatched() function. I’ve noticed that, starting from CUDA 10, there are matrix dimension restrictions of m <= 32 and n <= 32. With the substantial advancements in GPU technology and capabilities since then, I’m curious about the underlying reasons for these size constraints and whether there are discussions or plans to enhance this function by either lifting or expanding these limitations.

While I am aware of the cusolverDn<t>gesvdaStridedBatched() function for general matrices, it hasn’t seen widespread adoption in frameworks such as jax. I’m keen to understand if there are specific reasons for this limited adoption and if there are alternative strategies that the community recommends for handling larger matrix dimensions.

If there are ongoing efforts or future plans to address these issues, I’d be very interested to hear about them. Such enhancements would greatly benefit a broad range of applications and researchers like myself.

Thank you for any insights, suggestions, or updates on this matter.

LionSRq · September 14, 2023, 3:37pm

Just as an additional note: I’ve come across some discussions suggesting that ROCm and hipsolver (AMD’s equivalents) seem to support arbitrary matrix sizes with their gesvdjBatched function. This is at least evident from a pull request on CuPy’s GitHub: Support batched SVD by leofang · Pull Request #4628 · cupy/cupy · GitHub. It might be interesting to see how NVIDIA’s cuSOLVER and AMD’s solutions compare on this front.

Topic		Replies	Views
Parallel large SVD GPU-Accelerated Libraries cusolver	5	1147	October 20, 2023
Batched svd on cuda surprisingly slow CUDA Programming and Performance	1	1360	January 22, 2020
How to perform eigenvalue decompositions for large matrices with Python CUDA Programming and Performance	9	1005	May 17, 2024
Batched SVD? GPU-Accelerated Libraries	2	4082	February 5, 2019
cuSolver memory limit? svd solver cannot handle >128 matrices GPU-Accelerated Libraries cusolver	3	905	July 17, 2023
cuSolver SVD approximation GPU-Accelerated Libraries	4	570	July 6, 2019
Why Is 'S' parameter for cusolverDnDgesvd routine not supported? CUDA Programming and Performance	2	717	October 28, 2016
CuSolver: can't compute SVD on tall matrices with Cuda 10.1, bufferSize grows quadratically GPU-Accelerated Libraries	0	428	July 24, 2019
Xsyevd eigenvalue solver limits the matrix size GPU-Accelerated Libraries	0	11	January 30, 2025
Truncated Orthogonal Decomposition of very large Matrix using cusolverDnSgesvd GPU-Accelerated Libraries	0	470	April 18, 2019

The Origin of the m <= 32 and n <= 32 Limitations in gesvdjBatched?

Related topics