complex FP16 tensor core GEMMs


If someone knows the best (easiest to code) way to do a half-precision GEMM using tensor cores, I’d really appreciate any help.

It seems that, about a year ago, this wasn’t possible in cutlass (page 4):

And, the best approach was to map the problem to an equivalent real problem. On the other hand, planar complex GEMMs are mentioned in the latest cutlass profiler:

But, I suspect (line 139 in the above file), that only the basic GEMM is covered.

Cutlass aside, it seems that cublas will handle the problem:

That is, so long as the input and output are separated into real and imaginary parts. Ideally, I’d like to use an interleaved layout.

If you have a view on the best approach, I’d welcome the input.



The easiest way would be to use cublasLt. You are correct in that cublasLt currently requires planar layout. There is a function to help transform from interleaved to planar but with a negative impact to performance.

You should also have to functionality in CUTLASS to do what you’re asking. I think you should be able to mod pretty easily.

If you’re able to come to GTC 2020, you should be updated presentations on both libraries I think.

Thanks–could you please link to the interleaved-to-planar function?

There is another interesting presentation on this topic, for anyone following along:

“Towards Half-Precision Computation for Complex Matrices”

The planar <-> interleaved functionality is in mnicely’s complex half precision example: