hybrid RT HPC: limitations and questions


I’m working on RT HPC applications on hybrid platform (Multi-CPUs and Multi-GPUs).
Our current implementation limitations are mainly the lack of RTOS compatibility and of course host<->device memory copies.

Here is my bunch of questions:
I heard GE-IP supply rugged GPGPU based on CUDA. I guess CUDA RTOS drivers are on the path then. Is there updated plan for RTOS CUDA drivers? If so, may I have some informations please?
Or, are linux kernel space CUDA APIs planned on your roadmap?

And regarding our memory tx issues…
On the PCI-E we have

  • several NVIDIA GPUs
  • a multi-channel digitizer

How can we copy directly from a GPU to another? It should be doable AFAIK. Even a hacky way would do the trick for me at the moment.
And what about on a dual GPU card? eg. on a GTX 295, is it possible to avoid the PCI-E?
Btw, it would be great if we could copy directly from the digitizer to GPUs too :P

And my final subsidiary question
Since cuda 2.2, memory is mappable into the CUDA address space. Integrated GPUs, like on some notebooks, share the same physical memory location with the host memory. I realy don’t know how this stuff works but I’m wondering if we could completly avoid cudaMemCopy call then. Puting aside memory coherency issues we could switch the owner (host or device) of the mem address… And yes, I know this hack would be very ugly… but interresting challenge though.
What do you think?