The documentation web pages on the cuSOLVER SVD routines DnXgesvd , DnXgesvdp and even DnXgesvdr state that U is an ldu * m array with ldu not less than max(1,m) and this independently of the job type (‘A’, ‘S’, etc).
Is it possible that this memory consumption of m*m entries for U is only necessary for the job type ‘A’? In the Lapack documentation on dgesvd, the size of U depends on the job type.
(Obviously, for tall, rectangular matrices with many more rows than columns a memory space that always depends quadratically on the number of rows will quickly hit the glass ceiling of available RAM.)
Best regards, Frank H.