Does a kernel write back its output data from cache to global memory when finishing executing?

Hi all!

I have a problem about the global memory and cache.

If it’s not “Write-Through” policy, when does the newest data in cache be written back?

How does the GPU ensure that data in GDDR or global memory is up to date when Cudamemcpy function from device to host is called?

Thanks in advance!

Liyan Chen
2022.09.10

A way to think about it is that the L2 cache is a proxy for device memory. Device memory accesses go through the L2 cache. Any access that goes through the L2 cache will read “updated values” as they appear in the cache.

cudaMemcpyHostToDevice → updates L2 cache (see here for an example)
Kernel1 → updates L2 cache
Kernel2 ← reads from L2 cache
cudaMemcpyDeviceToHost ← reads from L2 cache

To answer your question, the oldest data in the cache will be written out to device memory when it needs to make space for new data, according to the cache eviction policy.

1 Like