With the venerable old Kepler based Titan Black reaching end of line (and getting progressively harder to get hold of, currently down to calling OEM pc builders for any hiding in corners) I did some comparison work with the 980. For my working set the 980 was generating about 80% of the performance of the Titan, which I am putting down to the memory bandwidth. Most of my kernels are either small 2D FFTs, or relatively simple read-adjust-write in global. Unfortunately this is a bit of an issue since a 20% performance drop is not desirable in anything.
Are there any other tricks that people have done in Maxwell (maxas notwithstanding ;P) to eek out extra performance from this advanced, but seemingly not quite as powerful, card. With memory bandwidth being ‘crippled’ as it is I doubt there is much I can do with the sheer number of global loads and stores.
Alternatively is there any official (or unofficial) news on GM200? I have seen the pictures of the chips, and the ‘performance specs’ but it seems to have gone a bit quiet recently.
I am not aware of any shipping Maxwell-based GPUs that match the memory bandwidth of the top-of-the-line Kepler-based GPUs.
As far as FFT performance goes, NVIDIA reports significantly improved performance in CUDA 7.0, which is currently at the release candidate stage:
Maybe that can give you back some of the “missing” performance. Seems worth a try.
I was aware of Cuda 7, but have not yet investigated. Indeed it is definitely worthy of a try. Similarly there is the cuFFT callback option which I have yet to look at but presumably that would be beneficial speedups to either case. My current hope is that the Titan replacement (Full-fat Maxwell) is announced at GTC with the currently presumed 384 bit interface (though 512 would be awesome) even if it is only Quadro/Tesla for the time being.
I know that the GTX line is designed for graphics, and in that role the 980 is lightning but it just feels slightly odd that nvidia have chosen to stop the big Kepler line with no direct replacements. I am assuming this is down to fab space or something where manufacture of both the Kepler and Maxwell is impractical, or that they mis-estimated how much GK110 ‘stock’ they needed to tide them over to GM100.
Maxwell is an awesome chip and I have some real hopes for it (single slot 970s, 990s with sensible power and thermal requirements) I just feel that the memory holds it back.
I will readily agree that with the massive growth in FLOPS in recent years, many real-life codes are now becoming limited by memory bandwidth.
On the other hand, if I recall correctly from my days involved in building CPUs, high-speed switching at I/O pins can be a significant contributor to overall power consumption since large capacitive loads need to be driven. I guess that the choice of narrower memory interfaces on Maxwell could be related to customer desires for a more power-efficient GPU platform.
But it does not help to speculate, the current bandwidth is what it is. My basic approach here would be practical: Since I cannot influence what products NVIDIA may or may not offer at some unspecified point in the future, the best bet would seem to try the latest software stack they have on offer today, especially if such an experiment is not particularly costly.
The rumor about the new Titan is that it will be announced at GTC 2015. The leaked specs indicate that the GBs will be around the same as the Titan Black.
I will say that I have been working quite a bit with the GTX 980 as well as the GTX 780ti and the GTX Titan Black. What I have learned is that even though the GTX 980 has less overall global GBs, the increased compute capability and the improvements in shared memory do offset that deficit a bit.
After some tweaking for Maxwell, I was able to get some custom FFT based convolution code to run faster on the GTX 980 than the version for the 780ti, which was a surprise.
Another interesting data point is that we measured the power draw from two concurrent GTX 980 running in the same PC, against a single GTX 780ti and the power draw from two GTX 980s was only marginally more than the single GTX 780ti.