CUDA - Make a specific memory access skip the cache

adibablu998 · February 4, 2026, 8:40am

I have a Kernel which first read values from certain memory locations then writes to those memory locations. I also have a lock which ensures that at any point in time only one thread is writing to any memory location.

The kernel looks somewhat like this:

__global__ void fun(){    
    if(!lock())return; // if lock fails, return

    if(!checkCondition()){ // memory read
        release_lock();
        return
    }

    update(); // memory write

    release_lock();
}

The kernel runs as expected when I launch on a single block. When I launch on multiple blocks, I get some errors. I was able to find out that this is because the changes made by an update() call are sometimes not reflected in a checkCondition() call which is executed after the update() call.

If I disable the L1 cache using compiler flags -Xptxas -dlcm=cg, this issue disappears, so I inferred that the error is arising because of threads from the second block reading stale values from the L1 cache.

Disabling the L1 cache makes my program run much slower though, so I am looking for other ways to fix this error.

Is there any way to make sure that either the L1 cache is updated immediately after every relevant write, or that every relevant read bypasses the L1 cache, without completely disabling it?

rs277 · February 4, 2026, 9:28am

You can apply “cg” cache behaviour on a given read by utilizing the T __ldcg(const T* address); function.

Robert_Crovella · February 4, 2026, 2:42pm

Topic		Replies	Views
cannot disable L1 on Fermi CUDA Programming and Performance	0	3744	June 8, 2011
Anyway to force several bytes to be in L1/L2 cache so that I can use it across multiple threadblocks within one kernel? CUDA Programming and Performance	2	500	June 24, 2022
L1 Cache, L2 Cache and Shared memory in Fermi CUDA Programming and Performance	5	23721	March 21, 2011
Read a value in global memory which was written by another thread block CUDA Programming and Performance	5	1777	October 15, 2014
Bypassing cache in Fermi CUDA Programming and Performance	16	4957	August 28, 2010
Can I avoid caching for read once value - very large array accessing each element only once CUDA Programming and Performance	2	62	August 19, 2024
How to keep L1 and L2 cache consistent CUDA Programming and Performance	1	1399	October 27, 2011
Flushing dirty L2 cache lines CUDA Programming and Performance hw , cuda	8	1424	July 6, 2023
Cache operators CUDA Programming and Performance	1	1703	June 15, 2016
How can I check and see if my GPU is using L1 cache CUDA Programming and Performance	7	3115	June 9, 2011

CUDA - Make a specific memory access skip the cache

Related topics