CuFFT, how it works?

Hello, I’m a computer science student keen on CUDA technology and how it operates by parallelizing the code. I would like information on HOW the CuFFT library work, in the sense of how it can parallelize the operations of its functions.
Being an integral part of the CUDA toolkit I found just the header file, but how can I get details about the methods and how parallelization is carried out?
(just a single function’s example of the library and how it is performed by the CUDA technology to make operations Parallel)
Thank you all for your time!

This paper may be of interest:


Thank you so much!