cuMemcpyDtoD with overlapping memory

Say I have a pointer to device memory and I want to use cuMemcpyDtoD. In particular the source region covers the back-most two-third of the memory region and I want to copy that to the beginning of the entire region. In other words source and target memory overlap. Can I rely on cuMemcpyDtoD doing this right?


I didn’t find a formal statement for cuMemcpyDtoD but a formal statement is given here:

“The memory areas may not overlap.”

You could perform multiple copies of (srcptr - destptr) elements one after another.

something like:

int n = 30;
int* array;


//want to copy array[10 - 29] to the front

cudaMemcpy(array, array + 10, sizeof(int) * 10, D2D);
cudaMemcpy(array + 10, array + 20, sizeof(int) * 10, D2D);