Just back from the first day of the conference.
There was a lot of small mentions of Fermi’s features but no specific tech presentation.
But the chip diagram Jen-Hsun showed 32 SPs per SM.
It is speculation, pure speculation on my part, this is not confirmed, don’t trust this, but the phrase “more flexible SMs” was used once, and putting this tiny mention together with the 32 SPs makes me think that maybe SMs now have 64K of shared memory and not 16K. This is still the same amount of shared memory per SP. The net effect of this would indeed be more flexible designs, since current kernels would run fine, just with more blocks per SM, but you’d have the option of using more shared memory (at the cost of fewer SMs.) Please note again this is my personal speculation entirely… It could also be that the “shared memory” is now just gone and it’s a per SM cache of the new flat address space.
Jen-Hsun also mentioned “C++ support.” He didn’t explain what this meant. This makes me think that it means virtual functions and function pointers… which would make sense if the memory address space is now merged and flattened. It’s hard to do with G200 since a class in registers doesn’t HAVE an address, so you can’t do the indirections.
A question I’ll try to find out is if the memory address space is flat 64 bit or 32 bit. I would hope it’s 64 bit, even with the painful size of the pointers, since we’re hitting 32 bit limits hard already. It’d also allow nicer memory mapping games with the CPU since you have so much address flexibility. Larrabbe is 64 bit addressing (only).
The double support was full IEEE-754/2008.
Another surprise… the ECC support is NOT for just device memory. It’s also applied at all caches, to shared memory, and to registers too! I never knew ECC would even apply to registers, but I guess it makes sense.
Jen-Hsun said the first silicon was delivered “only a few days ago” and praised the engineers for getting it working so fast “though this isn’t at full speed yet, so it’s only 6X and not 8X faster.” It was vague if he meant the drivers needed more tuning for the new circuit timings, etc, or if the silicon itself was slow, but I think he meant the low level BIOS timings since he mentioned it in context with getting the software running.
I think the other speakers talked more about watts per flop more than anything else. Everyone on the high end is completely obsessed with electricity usage which is killing them. It makes me think there’s work done on the energy use of Fermi as well.
Finally, and this is an aside, about 1/3 of the keynote was in 3D. It was surprisingly effective… not just a gimmick. Even Jen-Hsun’s slides were in 3D and it was used nicely. The 3D live video projection of Jen-Hsun on the side screens as he spoke was especially vivid!
Also a nice thumbs up to the RTT guys for a great augmented reality live demo of adding virtual rims and brake shoes to a Ferrari tire. It was excellent, with Jen-Hsun moving the tire and a hand-held light and in video it really, truly, looked like the tire’s rims were there. It was so well done it was hard to understand what he was demoing at first until you saw his REAL tire was rimless.
Later a Tegra demo of augmented reality made me realize that Tegra may be a big deal too, I had always kind of ignored it since cell phones aren’t my interest.