CUDA 9 cufft slower for some sizes

Just installed cuda 9 and compiled an existing program using cufft against it.

Compared to cuda 8, some 1d-ffts are now faster:
5000 is now factorised as 625x8 instead of 25x25x8
27648 is now factorised as 27x1024 instead of 27x128x8

However 8192-pt ffts (which remains factorised as 8192) are now 20% slower.

Has anyone else encountered this regression?
I am running on a 1050 Ti.

Same here, I have a benchmark program doing some multistream FFT with 128k or 256k points and running 1 to 128 concurrent streams, on my Gtx1060 the performance is 1.6 to 9.4% slower than before…


I would suggest filing bug reports with NVIDIA. Use the bug-reporting form linked from the registered developer website.

Okay I have filed a bug report.

If anyone is interested, below is some minimal code to time cufft for user supplied lengths.