Irregular memory access patterns and the cache

Benutzer2183 · November 25, 2021, 7:00pm

I have a question regarding irregular memory access and caching.The memory coalescer of the Load/Store unit can only work efficiently if the memory access pattern is regular so that threads access contiguous memory space. What is the approach of the GPU when it comes to caching data of irregular memory accesses? Can the GPU still somehow use the cache efficiently? I am not very familiar with this subject, so it might be that I misunderstood some concepts. Any explanations are appreciated.

Robert_Crovella · November 25, 2021, 7:19pm

The GPU will request lines (from the cache) or segments (from memory) as needed, to satisfy the addresses requested across the warp. Cache lines are either 128 or 32 bytes, and memory segments are 32 bytes, for all CUDA GPU architectures I am familiar with.

Therefore, you can figure out what will be present or populated in the cache by determining which 32-byte memory segments will be retrieved, to satisfy a particular load or store request.

There is no reason to assume anything else gets cached (speculative prefetching) as a result of the transactions themselves.

Suppose memory segments are arranged starting from address 0, in 32 byte groups.

Suppose across a warp, we request an int value from int index 2 and an int value from int index 1024.

After those transactions are serviced, I would expect the bytes from 0…31 and the bytes from 4096…4127 to be resident in the L2 cache.

system · December 9, 2021, 7:20pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What happens on a constant cache miss? CUDA Programming and Performance	1	4239	February 6, 2008
How to understand memory access? CUDA Programming and Performance	1	503	October 16, 2023
Memory access should be coalesced but is not CUDA Programming and Performance	6	1062	May 16, 2019
Accessing same global memory address within warps CUDA Programming and Performance	4	4116	October 24, 2018
Is cache access coalesced? CUDA Programming and Performance	4	2011	September 5, 2016
Global memory access patterns - too slow CUDA Programming and Performance cuda , performance	6	1148	April 7, 2024
Conditions of coalescing global memory into few transactions CUDA Programming and Performance	3	657	December 23, 2019
Cache L1 and L2 Architecture Kepler CUDA Programming and Performance	2	3180	December 30, 2019
About coalescing CUDA Programming and Performance	6	2617	April 16, 2010
Global memory access cost CUDA Programming and Performance	4	2916	November 18, 2017

Irregular memory access patterns and the cache

Related topics