Where does unified memory lay in?

I want know where unified memory lay in.
It lays in both of them cpu and gpu, and what about it’s bandwidth?

It’s my understanding that UM is supposed to reside on the GPU(s). However, if not all of your visible GPUs have Peer Access to each other, then it will be allocated on the CPU, and the bandwidth will depend on the location and the usage. You may find this thread useful:

https://devtalk.nvidia.com/default/topic/792396/cuda-programming-and-performance/abysmal-performance-with-unified-memory-and-cublas/

Thanks very much.