FERMI has 2 DMA engines. so you DATA_IN and DATA_OUT of the GPU memory (from/to system RAM) SIMULTANEOUSLY along with KERNEL EXECUTIONS…
THis FEATURE is necessary because of their MULTIPLE_KERNEL execution strategy. This OPENS UP parallelism at a NEW LEVEL.
And, yeah 1TB of RAM. Cool. and UNIFIED POINTER support. Thus, at run-time, the memory generation unit can determine whether a pointer is shared OR global… That means, not entire 64-bits are used fo Global Memory.
A configurable L1 cache per SM - can be configured as 16K Shared Mem + 48K L1 OR Viceversa
Unified L2 support
and what not…
I am sure they will be pricing these for elephants…
It would be good if NV allows developers to submit jobs for FERMI and get results much like what intel does with http://paralleluniverse.intel.com
Long Live NVIDIA!