Multiple Parallel GPUs

I think streams allow to do asynchronous memcopys interleaved with kernel calls. That would be regardless of how many GPUs or CPU cores you have.