How to solve rectangular, linear systems in batch?

Hi,

I am trying to solve dense rectangular, linear systems A*x = b in batch.

I have seen it is possible to solve the non-batched version using a combination of “geqrf”, “ormqr” and “trsm”, as shown in the following example:
https://docs.nvidia.com/cuda/cusolver/index.html#ormqr-example1

I have found batched versions of “geqrf” and “trsm”:
https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-geqrfbatched
and
https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-trsmbatched
respectively.
However, I am not able to find a batched “ormqr” function and can therefore not complete the solve.

i Is there another way to use the batched “geqrf” and “trsm” to complete the solve? (is there a reason that there’s no batched “ormqr”?)[/i]

As another possibility, I found that batched “gels” can solve an overdetermined linear system:
https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-gelsBatched
However, it seems batched “gels” lacks the possibility to solve underdetermined systems for the time being.

i Does anyone know if this limitation will be removed sometime soon?
(3) Does anyone know if there is any other possible way to solve a dense rectangular, linear system in batch?[/i]

[Note: I found a post dealing with a similar question, which is about a year old:
https://devtalk.nvidia.com/default/topic/1021856/?comment=5200231
However, the answer there only points to non-batched solutions.
Therefore, I thought I should try my luck and ask the question again.]

Best regards,
David