Does a kernel write back its output data from cache to global memory when finishing executing?

Robert_Crovella · September 10, 2022, 1:48pm

A way to think about it is that the L2 cache is a proxy for device memory. Device memory accesses go through the L2 cache. Any access that goes through the L2 cache will read “updated values” as they appear in the cache.

cudaMemcpyHostToDevice → updates L2 cache (see here for an example)
Kernel1 → updates L2 cache
Kernel2 ← reads from L2 cache
cudaMemcpyDeviceToHost ← reads from L2 cache

To answer your question, the oldest data in the cache will be written out to device memory when it needs to make space for new data, according to the cache eviction policy.

Topic		Replies	Views
Standards of L2 cache CUDA Programming and Performance	2	72	March 24, 2025
Difference between L2 read/write transactions and L2_L1 read/write transactions ? CUDA Programming and Performance	3	1464	August 28, 2019
CUDA: How do I use L2 cache in Fermi? Legacy PGI Compilers	3	5397	June 22, 2011
Anyway to force several bytes to be in L1/L2 cache so that I can use it across multiple threadblocks within one kernel? CUDA Programming and Performance	2	448	June 24, 2022
Cache behavior when loading global data to shared memory in Fermi CUDA Programming and Performance	1	1010	April 30, 2013
Global memory access requests ordered..? CUDA Programming and Performance	2	580	May 8, 2014
When writing back to global, still firstly go through persistant L2? CUDA Programming and Performance	6	377	December 23, 2023
some question about "384-bit memory bus from device memory to L2 cache" CUDA Programming and Performance	2	1248	September 30, 2010
Cache data invalidation between kernel calls CUDA Programming and Performance	5	5525	August 22, 2013
Write Global Memory while kernel is running CUDA Programming and Performance	1	2044	April 16, 2009

Does a kernel write back its output data from cache to global memory when finishing executing?

Related topics