In Mark Harris’s session later that day, it was explained further what “Unified Virtual Memory” is. I was confused when I saw UVM mentioned in the keynote, because my first though was: “Wait, we already have UVA. What’s this??”
The ultimate goal of UVM is basically to use page faults in the virtual memory systems to detect when a piece of memory is being accessed on the GPU and move the pages to the device, and then move it back to the CPU when the CPU accesses it. The vision is that things like “cudaMemcpy” should become optimizations rather than requirements for data movement between the CPU and GPU. The full implementation of UVM will require the hardware changes in Maxwell, but the plan is to release a “UVM-lite” in the future that works on Kepler.
Mark showed a very nice demo where he took a simple CUDA program and rewrote it using “UVM-lite.” This model requires you to use a special “managed” version of cudaMalloc that tells the CUDA runtime you would like to opt into the UVM system. The memory allocated by the managed version of cudaMalloc is then directly useable both on the host and device, so you no longer need to keep separate host and device pointers around. The copies to and from the device are handled automatically for these managed pointers. (Edit: Note that this is different than UVA, where the memory reads are issued over the PCI-Express bus, but the data is not copied to or from global memory on your behalf.)
Also, I think you might have garbled two different announcements. The Tegra processor after Logan will have Project Denver 64-bit ARM cores and a Maxwell GPU. Volta is a separate thing with the stacked DRAM, and I don’t think there was any mention of ARM with that.