There seems to be quite a bit of confusion online about the meaning of cudaHostAllocPortable.
I searched the forum, but found no definitive answer.
When do I have to use cudaHostAlloc(cudaHostAllocPortable), and when is cudaMallocHost() enough?
Is this an issue of using multiple CPU threads, or is this an issue of using multiple GPU devices?
Specifically, if I am controlling multiple GPU devices from a single CPU thread,
if I call cudaMallocHost() is the memory pinned for all GPU devices,
or should I really call cudaHostAlloc(cudaHostAllocPortable)?
I suspect that this is something that evolved with the versions of CUDA.
What is the situation in CUDA 5?