The Origin of the m <= 32 and n <= 32 Limitations in gesvdjBatched?

Hello NVIDIA Community and Developers,

I’m reaching out as an indirect user of the cuSOLVER library, specifically through jax, for batched Singular Value Decomposition (SVD) operations. My work often involves the cusolverDn<t>gesvdjBatched() function. I’ve noticed that, starting from CUDA 10, there are matrix dimension restrictions of m <= 32 and n <= 32. With the substantial advancements in GPU technology and capabilities since then, I’m curious about the underlying reasons for these size constraints and whether there are discussions or plans to enhance this function by either lifting or expanding these limitations.

While I am aware of the cusolverDn<t>gesvdaStridedBatched() function for general matrices, it hasn’t seen widespread adoption in frameworks such as jax. I’m keen to understand if there are specific reasons for this limited adoption and if there are alternative strategies that the community recommends for handling larger matrix dimensions.

If there are ongoing efforts or future plans to address these issues, I’d be very interested to hear about them. Such enhancements would greatly benefit a broad range of applications and researchers like myself.

Thank you for any insights, suggestions, or updates on this matter.

Just as an additional note: I’ve come across some discussions suggesting that ROCm and hipsolver (AMD’s equivalents) seem to support arbitrary matrix sizes with their gesvdjBatched function. This is at least evident from a pull request on CuPy’s GitHub: Support batched SVD by leofang · Pull Request #4628 · cupy/cupy · GitHub. It might be interesting to see how NVIDIA’s cuSOLVER and AMD’s solutions compare on this front.