Flushing dirty L2 cache lines

rnaza005 · July 4, 2023, 9:07pm

Hi,

I have seen NV_UFLUSH_L2_FLUSH_DIRTY and MEM_OP_D_OPERATION_L2_FLUSH_DIRTY in NVIDIA Driver.
I have found this in one of the drivers, 390.48. I checked it is available on later versions as well (470, open-source drivers as well).
For example, one of the file that contains this definition is clc06f.h inside kernel/nvidia-uvm folder.

I wonder if these definitions can be utilized to flush dirty lines in GPU L2 cache.
Also if possible, how can I use them in my program?

Thanks in advance,
Ravan.

Robert_Crovella · July 5, 2023, 10:57pm

Currently, there isn’t any method in CUDA to flush L2 cache lines. If you have questions about the GPU linux driver, there is a separate forum for that.

Since the L2 is a device-wide proxy for GPU DRAM memory (logical local and global spaces), its not immediately obvious to me why flushing the L2 would be needed.

rnaza005 · July 5, 2023, 11:04pm

I need a way to bypass the L2 cache so that access is served directly from the GPU DRAM memory.
Yes my question is regarding the driver, not cuda API.

Thanks.

njuffa · July 5, 2023, 11:20pm

For what purpose? Maybe there are other ways of (approximately) accomplishing whatever it is.

In that case, the logic of asking in a subforum emtitled “CUDA - CUDA Programming and Performance” escapes me.

rnaza005 · July 5, 2023, 11:26pm

I just need each access to be served from DRAM.

I was actually directed to this forum from another forum:)
I reasked this question in the above-mentioned forum.

njuffa · July 5, 2023, 11:34pm

Why is that? For what purpose? How does normal L2 cache operation interfere with that purpose?

rnaza005 · July 5, 2023, 11:37pm

for my research

striker159 · July 6, 2023, 5:05am

Your best bet is to use the ptx cache modifier. However, they are only a hint.

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cache-operators

.cv Don’t cache and fetch again (consider cached system memory lines stale, fetch again).
The ld.cv load operation applied to a global System Memory address invalidates (discards) a matching L2 line and re-fetches the line on each new load.

For sm_70 and newer, there is also

no_allocate Do not allocate data to cache. This priority is suitable for streaming data.

rnaza005 · July 6, 2023, 5:44am

Actually, I have tried these approaches.
no_allocate seems to be NOT supported for L2 caches. I have tried L2::no_allocate, but compilation failed.
I don’t want to access system memory, but the device DRAM.

Thanks.

Topic		Replies	Views
Flushing dirty L2 cache lines Linux	0	344	July 5, 2023
Flushing dirty L2 cache lines DRIVE Hardware hw , cuda	10	1097	July 19, 2023
Cache line flush CUDA Programming and Performance	11	3098	September 20, 2023
Evicting lines from the cache during kernel execution. Possible? CUDA Programming and Performance	2	854	August 29, 2018
Evicting lines from the cache during kernel execution. Possible? Jetson TX2	2	786	October 18, 2021
Cache operators CUDA Programming and Performance	1	1725	June 15, 2016
Can I disable L2 caching? CUDA Programming and Performance	3	2609	May 27, 2015
L2 flush policy across kernels Nsight Compute	0	839	September 12, 2021
Fermi L2 cache How fast is the L2 cache? How do I access it? CUDA Programming and Performance	11	26406	December 2, 2011
How to bypass cache from driver? Jetson TX2	2	677	October 18, 2021

Flushing dirty L2 cache lines

Related topics