Hi

I am studying @mnicely’s code for multiplying two half precision complex matrices (link), but cannot generalize it to the case where the leading dimension of A is less than the number of rows of A.

Say matrix A1 is m=3 (rows) x k=3 (columns), and just considering the real part:

1 4 7

2 5 8

3 6 9

Then, with column-major format, A1 is stored as an array [1 2 3 4 5 6 7 8 9].

I am interested in the cases like A2 =

1 2 3

2 3 4

3 4 5

That is, where each column is offset by less then the number of rows. Here A2 is stored as an array [1 2 3 4 5]. In terms of the GEMM, we have m=3 and lda=1, where lda is the “leading dimesions”, and determines how much we jump through the array for each column (link).

When lda<m, the total number of elements in the array is N = m + (k-1) x lda. For our example, N = 3 + 2*1 = 5.

So, in @mnicely’s code, I simply change line 339 to:

size_t sizeA = (k-1)*lda+m ; //m * k;

With lda = m (which is the original setting, on line 336), everything if fine (because, sizeA = m*k in this case). But, when I set lda = 8 (the case I am most interested in), I get the following:

CUDA error at …/…/…/cublasLt_C16F_TCs.cu:279 code=7(CUBLAS_STATUS_INVALID_VALUE) “cublasLtMatmul( ltHandle, operationDesc, alpha, Atransform, AtransformDesc, Btransform, BtransformDesc, beta, Ctransform, CtransformDesc, Ctransform, CtransformDesc, nullptr, workSpace, workSpaceSize, stream )”

Line 279 is the cublasLtMatmul call … but I am having no luck tracking down the source of the error. It seems like the new sizeA should handle things.

The documentation for cublasLtMatrixLayoutCreate says that the lead dimension must be >= m … but I think this may be an error (lda<m certainly works with cutlass::gemm::device::GemmBatched).

Appreciate any suggestions.

Thanks

Gary