CudaMemcpyDeviceToDevice from one GPU to another

Can anyone help me solve this problem,

I’m try to send data from one GPU to another

with using

cudaMemcpy(dataD,dataS,1*sizeof(double), cudaMemcpyDeviceToDevice);

for two CPU threads

but data copy occurred only on the same divice

This is one of the most asked question on this forum. The answer is that you cannot use cudaMemcpy(…, cudaMemcpyDeviceToDevice) for transfer of data from one GPU to another. You need a cudaMemcpy(…, cudaMemcpyDeviceToHost) and then another cudaMemcpy(…, cudaMemcpyHostToDevice).
It is true that the flag name “cudaMemcpyDeviceToDevice” is confusing, but it actually mean copy of data on THE SAME device (i.e. GPU) from one memory address to another.
There is not currently support in CUDA for direct copy of data from one device to another (at least not in CUDA 2.1 :) )

Thank for your answer