I have seen NV_UFLUSH_L2_FLUSH_DIRTY and MEM_OP_D_OPERATION_L2_FLUSH_DIRTY in NVIDIA Driver.
I have found this in one of the drivers, 390.48. I checked it is available on later versions as well (470, open-source drivers as well).
For example, one of the file that contains this definition is clc06f.h inside kernel/nvidia-uvm folder.
I wonder if these definitions can be utilized to flush dirty lines in GPU L2 cache.
Also if possible, how can I use them in my program?
Currently, there isn’t any method in CUDA to flush L2 cache lines. If you have questions about the GPU linux driver, there is a separate forum for that.
Since the L2 is a device-wide proxy for GPU DRAM memory (logical local and global spaces), its not immediately obvious to me why flushing the L2 would be needed.
.cv Don’t cache and fetch again (consider cached system memory lines stale, fetch again).
The ld.cv load operation applied to a global System Memory address invalidates (discards) a matching L2 line and re-fetches the line on each new load.
For sm_70 and newer, there is also
no_allocate Do not allocate data to cache. This priority is suitable for streaming data.
Actually, I have tried these approaches.
no_allocate seems to be NOT supported for L2 caches. I have tried L2::no_allocate, but compilation failed.
I don’t want to access system memory, but the device DRAM.