cuMemcpyDtoD with overlapping memory

Say I have a pointer to device memory and I want to use cuMemcpyDtoD. In particular the source region covers the back-most two-third of the memory region and I want to copy that to the beginning of the entire region. In other words source and target memory overlap. Can I rely on cuMemcpyDtoD doing this right?

no

I didn’t find a formal statement for cuMemcpyDtoD but a formal statement is given here:

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g85073372f776b4c4d5f89f7124b7bf79

“The memory areas may not overlap.”

You could perform multiple copies of (srcptr - destptr) elements one after another.

something like:

int n = 30;
int* array;

...

//want to copy array[10 - 29] to the front

cudaMemcpy(array, array + 10, sizeof(int) * 10, D2D);
cudaMemcpy(array + 10, array + 20, sizeof(int) * 10, D2D);