cufft speed on GTX980

I don’t know who else has compared the GTX980 with the GTX780Ti, but I have seen the 780Ti to be about 45% faster computing Radix 32, 64, and 128 passes. This is over multiple host hardware. Most of my other kernels are comparable in speed, but I wondered if anyone else has compared the 2 with cufft, and if nVidia has noticed this issue. Thanks!

FFTs are bandwidth limited and the 780Ti has a 384bit wide memory controller vs the 256bit wide memory controller of the 980, so the results you are reporting are expected.

Specifically, the respective memory bandwidth of the GTX 780 Ti and the GTX 980 is as follows:

Memory Bandwidth (GB/sec) 336

http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-980/specifications
Memory Bandwidth (GB/sec) 224

Out of curiosity, does that factor scale? So is it similar losses for 512, 1024 passes? I am interested as my image processing project uses lots of 128x128, 512x512 and 1024x1024 2D FFTs. I had a feeling when they announced the smaller memory bus it would cause problems but was not sure whether the additional extras in Maxwell would compensate.

I have the same question; also interested in the performance of similar sized 32bit fft’s; batched ones in particular. Not counting host/device transfer times.

Maxwell looks like excellent value as a 32 bit compute card, especially if a 8gb variant comes out, but im wary of the memory bandwidth. Could you share some more details of your benchmarks, to put them in context?

Anyone else seen any comprehensive fft tests on Maxwell somewhere? My guess would be that bandwidth to GPU mem shouldn’t be that restrictive, especially if the updated toolchain makes good use of the increase cache size. Once you’ve filled your cache with relevant data, the compute intensity of fft’s should be quite high. Though to be clear im shooting from the hip here, ive never actually gotten into the down and dirty of optimizing an fft kernel for GPU’s myself.

In general when there is heavy use of shared memory and a great deal of 32-bit compute the GTX 980 outperforms the GTX 780ti by as much as 20-25%.

Overall though that bandwidth difference is significant, and most of my applications still perform better on the GTX 780ti. Also the GTX 980 does not like the 64-bit realm, and the GTX 780ti actually does better there(especially considering it is an older consumer GPU).

Thanks. Perhaps this is a stupid question btw, but for that types of memory access is the new Maxwell memory compression enabled? Does it act regardless, or only for texture fetches? I don’t see any info anywhere.

I don’t care for double precision at the moment. Even if a little slower in single precision, the 980 is still great value, and great efficiency I suppose. Still, if the difference is indeed 45% under the benchmarks relevant to me, as suggested by the OP, I would stay away from Maxwell for now…