Just installed cuda 9 and compiled an existing program using cufft against it.
Compared to cuda 8, some 1d-ffts are now faster:
5000 is now factorised as 625x8 instead of 25x25x8
27648 is now factorised as 27x1024 instead of 27x128x8
However 8192-pt ffts (which remains factorised as 8192) are now 20% slower.
Has anyone else encountered this regression?
I am running on a 1050 Ti.
Same here, I have a benchmark program doing some multistream FFT with 128k or 256k points and running 1 to 128 concurrent streams, on my Gtx1060 the performance is 1.6 to 9.4% slower than before…