I’m trying to do an FFT on a device with compute capability 7.5 using FP16 inputs and outputs. I’ve noticed that, using the profiler:
1k FFT (complex float): 2.58 us
1k FFT (complex half-float): 2.68 us
So I’m curious if cuFFT has been optimized for half-floats.