Hi I have an application which starts 2 processes, each with a context on an individual CUDA device. Later it wants to copy the data from one device to another device. I assume this is the purpose of cuMemcpyPeer(). The information of two CUcontexts data structures and two CUdeviceptrs are shared through IPC between two processes. When I try to do like this:
cuMemcpyPeer(dev_ptr, context, dev_ptr, context, size);
it executes with a successful return value. But the data copied across two contexts are actually wrong.
How am I supposed to do this?