iGpu vs discrete gpu perf

Hi all, i did some cursory search on the forums and the web but came up short.

Is there any improvement in performance in terms of latency with using an igpu solution like Tegra TK1 vs discrete gpu sitting on a PCI-e?

My cuda projects have shown that much of the latency involves sending data over to the gpu and fetching it back.
With an igpu would the latency be lessen since both CPU/iGpu are on the same die and there isn’t the pesky PCI-e bus in the way.

If someone could explain how the memory model works for discrete vs igpu that would be great.