How fast is CUFFT compared to x86 for non-powers of two?

I use FFT on x86 for mixed powers of 3, 5 and 7, but not for power of 2. I read from some old (2008) benchmark that CUFFT is not much faster than x86 for non-powers of two. Is there some newer benchmark comparing CUFFT to x86 for non-powers of two?

Niether of these are a direct answer to your question but may be of interest:
The most recently published CUDA (6.5) performance report is here: (cufft data on slides 7-9)

And in CUDA 7, performance for transform sizes that are composite powers of 2,3,5, or 7 has been significantly improved:

Thanks. That gives me enough reason to convert my x86 code to cufft and get some benchmark myself.

If you have some benchmark data for non-power-of-two FFTs on modern Haswell-class Xeons, I’d love to see it. I spent 30 minutes searching the internet and came up empty handed.