DMA Kernel access to the video overlay area?

The FAQ says that DMA from another PCIe device is not possible currently but may be available in the future.

My question is that since the G80 cards are capable of video overlay (haven’t checked, just assuming) how is this different from dumping data to an arbitrary memory buffer on the device? Is there any architectural issue (i.e. cache coherency, thread conflicts, whatever) or it is just a matter of not exposing any device address?

Could I use the video overlay functionality to transfer data to the video card? That is, can CUDA kernels access the overlay area?