Suppose I have a kernel which needs operating on memory of two GPUs, do I have to copy all the data to the GPUs which the kernel is launched, or I can access memory of another GPU in the kernel directly?
I have made a simple test, but the program hang if I try to access memory of another GPU in the kernel.