How can I copy data from another GPU in a kernel?

Hello All,
Im sorry for a dummy question.

I’m wondering if I can copy data from one gpu to another gpu in the kernel? When I use cudaMallocManaged, this happens automatically, however, I would like to do it by myself manually.

If the GPUs can be but into a peer relationship, you can do a cudaMalloc on one GPU, then pass that pointer to another GPU that is in a peer relationship. The kernels on that other GPU can use that pointer to access or copy the data fro the first GPU.

Read up on cudaDeviceCanAccessPeer and cudaDeviceEnablePeerAccess

also look at the CUDA P2P sample codes