Fermi info

This may sound like an unusual question, but here goes. The people who manage me want to know about the NVIDIA Fermi chip. I only know a little about it from what I read in the news. I guess their main questions is what kind of speed boosts we can expect from the Fermi chip. In other words how much of an advance in speed is there going to be from a speedup standpoint using the Fermi?


Full specifications have not been revealed yet, but the architecture whitepaper can be found here.

The target performance numbers were of the order of 500-600 Gflop/s double precision peak compared with about 80 Gflop/s double precision peak on the current G200 Tesla. Of course those are peak numbers, how the architectural changes translate into real performance is basically anyone’s guess. It would also imply that you have code well tuned for the new architecture, or in some cases any code at all…

For well-optimized, single precision, compute bound kernels, it sounds like you’ll get roughly a factor of 2 over a GTX 285 or Tesla C1060. For memory-bound kernels that already have good coalescing, a little less than a factor of 2. (Probably. Although the architecture has been well documented, the clock rates on soon to be released hardware is still the subject of rumors.)

One thing to consider is that the huge architecture improvements to Fermi will improve the performance of code which did not run well in CUDA before by much more than a factor of 2. It’s hard to quantify how much improvement that will be without getting your hands on some hardware, though. I have several test kernels I want to write and run on Fermi to see how they perform. Algorithms that make heavy use of atomics, or random access data that will fit in the L1 or L2 caches, or use a lot of double precision operations will see much larger performance increases.

So in some ways, the people who should look closest at Fermi are the people with kernels that run poorly on GT200.