Host/Device Bandwidth when multiple devices are used

I am preparing a research paper that addresses connecting multiple GPUs together and distributing work among them using MPI.

Someone told me they read somewhere that when multiple GPU devices are connected to a host, the host/device bandwidth is reduced. I can understand how multiple devices connected to a shared bus would appear to show inferior bandwidth compared to a single-GPU arrangement, because the bandwidth is distributed among all the GPUs. But is there some other factor that would cause host/device bandwidth to decrease superlinearly?

Any insight would be appreciated.