is it possible to access pinned/locked memory obtained by cudaHostAlloc() by two separate CPU threads with two cudamemcpys from device to host at the same time?
In Cuda 3.2 this is not possible.
Does it work with CUDA 4.0?
Is the solution called portable pinned memory? Or does this just move between numa domains?
I’m not sure if the driver allows you to access context created by one thread in another thread.
As for the double copy, I guess it’s possible with Tesla cards since it has two DMA Engines. But it certainly makes no sense at all because the limiting factor is PCIe bandwidth. For GeForce cards it’s certainly impossible.
Portable pinned memory is for use with multiple GPUs, not multiple CPU threads.
Just as a wrap up.
Portable is the key.
Then different threads can access everything.
A more elegant solution is to use the new CUDA functions: