I see thank you very much for the explanation! I’ve wondered why CPUs don’t use a L1-cache/shared memory combined approach, and let the programmer explicitly place data in the cache. It seems to be very helpful to have both automatically HW managed cache and programmer controlled cache, like shared memory in GPUs so that when we do need explicit cache control it’s at our disposal. Any reasons CPUs are not designed that way?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
L1 Cache, L2 Cache and Shared memory in Fermi | 5 | 23465 | March 21, 2011 | |
No performance inprovement shared mem x global mem | 5 | 1144 | April 26, 2013 | |
life span of shared memory | 15 | 6937 | April 27, 2011 | |
Where do atomic operations go, and why are atomics to __shared__ faster than those to GMEM? | 6 | 2212 | July 11, 2022 | |
CUDA Refresher: The CUDA Programming Model | 2 | 648 | January 26, 2023 | |
Cant understand Shared Memory Concept ! I want to talk Live to somebody who knows it !!& | 2 | 1437 | April 13, 2009 | |
optimization shared memory fail major speed using shared memory in detriment of global memory | 3 | 3667 | March 31, 2011 | |
When will we want to use L1? | 6 | 49 | August 16, 2024 | |
In V100 GPU or A100 GPU, CUDA COREs- data miss in registers where do the CUDA core look first for data - in Shared Memory or in L1 cache? | 4 | 522 | September 27, 2023 | |
Why is the performance more? Refering to Dr Dobbs article | 10 | 2637 | April 23, 2010 |