In page 5 of the CUDA Programming Guide,…
applications can take advantage of it by minimizing overfetch and round-trips to DRAM and therefore becoming less dependent on DRAM memory bandwidth?
My question is:
What could this possibly mean?
What does over-fetch and round-trips to DRAM mean? Why will there be over-fetching? I was even thinking whether it’s possible to do pre-fetching for some of the scientific CUDA Code.