CuSolver: can't compute SVD on tall matrices with Cuda 10.1, bufferSize grows quadratically

Hi all,

I recently upgraded to Cuda 10.1, and I can’t compute general SVD of tall matrices anymore, that is cusolverDnSgesvd with large m, tiny n. After some inspection, I noticed that cusolverDnSgesvd_bufferSize is quadratic, but I believe the memory needed to compute SVD should be linear. Here is the plot of values in log scale: I actually get an overflow for m larger than ~32k, which is not that tall.

Is there any known workaround?