The new Kirk and Hwu book focuses on G80 architecture, but there is a few pages talking in general about “the new Fermi architecture which arrived as this book was going to press.”
The clues mentioned by the book start with the 64 bit address space. It specifically says how useful this is to map host and device memory into a single address space, making it easier for the GPU to seamlessly access host data. (This is obviously done via zero-copy like transparent transfers over the PCIE bus). This wasn’t news, though it wasn’t actually announced before I think. The book also mentions this extends to MULTIPLE GPUs, giving each non colliding addresses, and allowing device<->device transfers in a peer to peer method. This implies bus mastering though the book doesn’t use that term. The book does say the true potential of the peer to peer memory transfer may take “years to fully exploited in the SDK”. Meaning maybe that we won’t get this feature immediately, though the hardware is capable?
The book does say that certain standard library functions are supported… specifically using printf() in kernels. This doesn’t sound like the cuPrintf() library, it sounds like lower level, since “this can lead to system call traps.”
Next, there’s a small section that talks about the multiple kernel feature of Fermi. The book says this means it’s better to use small block counts now since you don’t need to worry about idle SMs any more. This answers a question we had though… in my interpretation it seems like kernels blocks now DYNAMICALLY span SMs… meaning that your kernel might start using 10 out of 16 SMs then during its lifetime use 12, or 4, and then back to 10… it’s load balanced with other kernels which are also running.
Kernels are now interruptable. This is an FAQ here on the forum!
We already knew this, but it says that kernels even when debugging will not interfere with the system or display, even during a kernel crash. This implies to me that Nexus will work with a single Fermi card. (Currently it requires 2 GPUs to run on a single system, one for display, one for compute).