I’ve read that the transfer overhead between CPU and GPU is a big bottleneck in achieving high performance in GPU/CPU applications. Why is this so?

According to Nvidia’s bandwidthtest program, my CPU/GPU bandwidth is about 4 to 5 GBps. Is this the peak performance, and actual performance is likely much lower? My application can only reach ~17 Gbps when data transfer is included in the performance measurement, a large drop from the 100+ Gbps rate when measuring only the GPU computation without data transfer.

a) Cost (of mainboards)

b) Physics

c) the fact that established bus systems like PCI-x are never at the forefront of what is technologically feasible.

Small bit of terminology pedantry: PCI, PCI-X (PCI-eXtended), PCI-e (PCI-express). Confusingly similar acronyms, but not the same things.