cublasLtMatmul with leading dimension (lda) < rows (m)


I am studying @mnicely’s code for multiplying two half precision complex matrices (link), but cannot generalize it to the case where the leading dimension of A is less than the number of rows of A.

Say matrix A1 is m=3 (rows) x k=3 (columns), and just considering the real part:

1 4 7
2 5 8
3 6 9

Then, with column-major format, A1 is stored as an array [1 2 3 4 5 6 7 8 9].

I am interested in the cases like A2 =

1 2 3
2 3 4
3 4 5

That is, where each column is offset by less then the number of rows. Here A2 is stored as an array [1 2 3 4 5]. In terms of the GEMM, we have m=3 and lda=1, where lda is the “leading dimesions”, and determines how much we jump through the array for each column (link).

When lda<m, the total number of elements in the array is N = m + (k-1) x lda. For our example, N = 3 + 2*1 = 5.

So, in @mnicely’s code, I simply change line 339 to:

size_t sizeA = (k-1)*lda+m ; //m * k;

With lda = m (which is the original setting, on line 336), everything if fine (because, sizeA = m*k in this case). But, when I set lda = 8 (the case I am most interested in), I get the following:

CUDA error at …/…/…/ code=7(CUBLAS_STATUS_INVALID_VALUE) “cublasLtMatmul( ltHandle, operationDesc, alpha, Atransform, AtransformDesc, Btransform, BtransformDesc, beta, Ctransform, CtransformDesc, Ctransform, CtransformDesc, nullptr, workSpace, workSpaceSize, stream )”

Line 279 is the cublasLtMatmul call … but I am having no luck tracking down the source of the error. It seems like the new sizeA should handle things.

The documentation for cublasLtMatrixLayoutCreate says that the lead dimension must be >= m … but I think this may be an error (lda<m certainly works with cutlass::gemm::device::GemmBatched).

Appreciate any suggestions.




Your issue may be a bug. I’ll file a bug report for the cublasLt developers to look into it.


Thanks Matt. Please let me know if I can help with more info, or whatever.

Hi Matt

Is there a bug tracker I can follow?