If someone knows the best (easiest to code) way to do a half-precision GEMM using tensor cores, I’d really appreciate any help.
It seems that, about a year ago, this wasn’t possible in cutlass (page 4):
And, the best approach was to map the problem to an equivalent real problem. On the other hand, planar complex GEMMs are mentioned in the latest cutlass profiler:
But, I suspect (line 139 in the above file), that only the basic GEMM is covered.
Cutlass aside, it seems that cublas will handle the problem:
That is, so long as the input and output are separated into real and imaginary parts. Ideally, I’d like to use an interleaved layout.
If you have a view on the best approach, I’d welcome the input.