Does CUDA IPC guarantee execution order across process?

Process A: cuMemcpy2D(srcA, dstA) (device to device copy) -> cuIpcGetMemHandle (dstA of previous cuMemcpy2D) -> send handle to Process B over pipe
Process B: cuIpcOpenMemHandle (from pipe) -> cuMemcpy2D (dstA, dstB) (device to device copy)

Here dstA is a device memory as a temporary transfer memory for IPC.
Assuming this is on the same GPU with cuStream set to NULL, is it guaranteed that cuMemcpy2D(dstA, dstB) will happen after cuMemcpy2D(srcA, dstA) even when they are called from two different processes in two different contexts?
Will dstB be guaranteed to have content of srcA?

cuMemcpy2D doc does say the API exhibits behavior for most use cases. Can I safely say the API will block until the devcie to device memcpy is finished?

If I use cuMemcpy2DAsync instead, is it supported to use the same Stream across two applications?

Nvm, I believe as long as I have all CUDA calls on stream NULL, then it’ll be synchronized. It’s more of a question whether there is a way to share CUstream across processes similar to a memory handle in cuIpc*** APIs.