I’ve read that the transfer overhead between CPU and GPU is a big bottleneck in achieving high performance in GPU/CPU applications. Why is this so?
According to Nvidia’s bandwidthtest program, my CPU/GPU bandwidth is about 4 to 5 GBps. Is this the peak performance, and actual performance is likely much lower? My application can only reach ~17 Gbps when data transfer is included in the performance measurement, a large drop from the 100+ Gbps rate when measuring only the GPU computation without data transfer.