Yeah, in order to sell CUDA to general developers (ie, games, etc), nvidia wants to be able to say “everything will work even if your user doesn’t have an NVIDIA card.” Hence CPU support is a priority. This won’t be like “emulation” mode where you can debug etc. This will be ordinary Release mode.
What I’m wondering is how far they’ll take it. Obviously they’ll do multicore. But in theory, they might be able to pull the same “Super-SIMD” trick on the CPU like they did on the GPU when they moved from 4-vectors to scalars. A 4-float SSE instruction would actually be applied to a warp with four threads. With four cores, you’d have 16 threads running simultaneously. It’d be optimized to the max and all you have to do is program to the CUDA model (which is way easier than messing with threads and SSE intrinsics!). At least… that’s what I hope.