I am using cublasCgemm with a SP float for signal processing on a AGX Xavier GPU. The physics guys have done some experiments with 8 bit fixed point complex numbers and those calculations work. Is there a fixed point CUDA implementation of a complex matrix multiply I can use and would it be faster than my SP implementation?
Please check out cublasGemmEx. See the compute type table
Are you interested in complex INT8 for A & B, with compute and C with FP32?