Wanted to know about setting aside L2 cache memory

trivedi.nagaraj · April 17, 2024, 1:04pm

Hi all, I want to get more information about why there is a need for setting aside a portion of L2 cache memory size for accessing global memory region.
I also want to know why this is required particularly when we use cuda streams and cuda graph.
Please refer to the section 3.2.3.1. L2 cache Set-Aside for Persisting Accesses and subsequent sections from the link about CUDA
https://docs.nvidia.com/cuda/cuda-c-programming-guide/

Thanks and Regards

Nagaraj Trivedi

AastaLLL · April 18, 2024, 3:02am

Hi,

You can customize the l2 cache usage in a certain level based on your use case.
For example, the data read only once and read multiple times can have a different cache strategy.

cudaStreamAttrValue stream_attribute;                                         // Stream level attributes data structure
stream_attribute.accessPolicyWindow.base_ptr  = reinterpret_cast<void*>(ptr); // Global Memory data pointer
stream_attribute.accessPolicyWindow.num_bytes = num_bytes;                    // Number of bytes for persistence access.
                                                                              // (Must be less than cudaDeviceProp::accessPolicyMaxWindowSize)

When a kernel subsequently executes in CUDA stream , memory accesses within the global memory extent [ptr..ptr+num_bytes) are more likely to persist in the L2 cache than accesses to other global memory locations.

Below is the extra info we query from cudaGetDeviceProperties() for your reference:

Device 0: "Orin"
  CUDA Driver Version / Runtime Version          11.4 / 11.4
  CUDA Capability Major/Minor version number:    8.7
  ...
  l2CacheSize/persistingL2CacheMaxSize/accessPolicyMaxWindowSize:   4194304 / 3145728 / 134213632
...

Thanks.

trivedi.nagaraj · April 18, 2024, 11:59am

Hi, I read this information in the document. But my clarifications are different.

Is it going to provide a faster access to global memory when its contents are present/persist in L2 cache?
How it is useful w.r.t CUDA Graph
Also let me know what is the difference between these two statements. Explain taking a practical example. Assume that the kernels part of stream also access global variables.
stream_attribute.accessPolicyWindow.hitProp = cudaAccessPropertyPersisting; // Type of access property on cache hit
stream_attribute.accessPolicyWindow.missProp = cudaAccessPropertyStreaming; // Type of access property on cache miss.

Please clarify.

Thanks and Regards

Nagaraj Trivedi

AastaLLL · April 22, 2024, 8:25am

Hi,

1. L2 cache is faster than global memory.
2. If certain data in the CUDA graph will be read multiple times, moving it to the L2 cache can reduce the latency.

3. For example:

    node_attribute.accessPolicyWindow.hitRatio = 0.6;

This indicates the 60% data in [ptr…ptr+num_bytes) is persisting and 40% is streaming.
Persisting means the data will be read multiple times and streaming is used for only one-time access.

Thanks.

system · May 22, 2024, 1:21am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can we allocate several L2 cache access window? CUDA Programming and Performance	0	463	August 1, 2022
Persistant L2 API restrict in stream, what if in other stream? CUDA Programming and Performance	3	418	February 2, 2024
Policy of L2 cache performance CUDA Programming and Performance	0	31	April 17, 2025
Switch off L1 cache CUDA Programming and Performance	2	3412	March 24, 2015
L2 persistence clarifications CUDA Programming and Performance	3	839	February 12, 2024
Set persisting area on L2 cache CUDA Programming and Performance	1	34	March 27, 2025
Use of L2 cache CUDA Programming and Performance	13	171	March 26, 2025
What is the difference: cuda Limit Persisting L2CacheSize，access Policy Max WindowSize，persisting L2Cache MaxSize CUDA Programming and Performance	10	630	December 26, 2023
L1 Cache, L2 Cache and Shared memory in Fermi CUDA Programming and Performance	5	23536	March 21, 2011
L2 cache hit rate of a streaming kernel is not as expected profiled in ncu CUDA Programming and Performance nsight	2	934	March 22, 2023

Wanted to know about setting aside L2 cache memory

Related topics