I have a basic doubt about cl_mem thing…
Say, I have a context made out of 2 CUDA GPUs. Now I create a cl_mem structure on top of it…
Later, I issue a clEnqueueRead… request to read that buffer…
Now, how much of that buffer comes from device 0 and how much comes from device 1?