Is Unified Memory in Tegra always fast?

Hello,

In case of Tegra, is it fast to use Unified Memory for the following cases?

case A

write to CPU memory.
cudaMemcpyHostToDevice
for ( several times )
{
    access to GPU memory.
}
cudaMemcpyDeviceToHost
read from CPU memory.

case B

write to CPU memory.
for ( several times )
{
    cudaMemcpyHostToDevice
    access to GPU memory.
    cudaMemcpyDeviceToHost
    access to CPU memory.
}
read from CPU memory.

I think case B could be faster when Unified Memory is used, but case A could not be faster.

Regards
Jin