Writing from GPU memory to memory on PCI device

I am looking to write a Linux application that needs to do a bunch of CUDA calculations on a GPU device, and transfer it to memory on another device.

I want to avoid the copy to the host memory, and transfer it from GPU memory, directly to the device memory.

If I used mmap to put the device in the host memory address space, could I then use cudaMemcpyHostToDevice() or something to write it directly to the device?

there’s no way to do this at the moment