Support for per-multiplication m, n, k, lda, ldb, and ldc in batched gemm

cublasHgemmBatched has single lda, ldb, ldc values for all multiplications. But in some cases, there can be multiple different sets of lda, ldb, ldc across hundreds of matrices. Is there any cublas support plan for this?

For example, if there are 5000 matrices with 90 unique M, N,K, lda, ldb, ldc parameter sets, is it mandatory to use cuda-streams to run multiple batched gemms in parallel?