How does GPU page table and TLB management differ from CPUs?

While reading the paper “Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual Memory”, I came across the following statements:

It says at Section 4.2:

In the GPU ↔ GPU case, the host does not need to update its own page tables but mainly orchestrates work on the GPUs. GPU page table updates and TLB shootdown are hardware based and relatively much faster. As a result, the host fault servicing time is more than halved.

It says the cost of page unmapping and tlb shootdown is much cheap for GPU ↔ GPU than CPU ↔ GPU since GPU is based on hardware. But I can’t find any kind of references.

I’d appreciate any insights or updated references that clarify the design trade-offs between CPUs and GPUs in this regard.

1 Like