concurrency of device to device copy

it is written in the cuda docs that memory copies between two addresses to the same device memory is always concurrent.
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#asynchronous-concurrent-execution

so my question here: is it possible that the device to device copy works concurrently like several independent kernels (I mean can it have 2 or more device to device copies at the same time) ? or is it still dictated by the asynchronous copy engine?

thanks