CPU cache vs. GPU shared memory

hi all,
i’ve a basic question :shifty: , why can’t I use the CPU cache (L1 256K or L2) simply like using gpu memory? (just no like-cuda api?) Has anyone any topic about it?

You have been able to do that in x86 for about 10 years. SSE added PREFETCHT0, PREFETCHT1, PREFETCHT2 and PREFETCHNTA instructions that allow data to be pre-fetched and stored in cache.

Still, prefetched data can be kicked out of the cache by instructions or other data while you are working on it. I think that it was never made configurable because 1) caches didn’t exist when the ISA was defined, 2) it would be difficult to make applications portable as caches increased in size (nvidia seems willing to live with this whereas intel/amd historically have not), 3) making the cache controller/memory-data-path more complicated might lead to slightly slower caches, and 4) x86 CPUs were not historically designed for high performance applications (there was not a single x86 supercomputer in the top500 until 1994) and thus their users were unlikely to actually use scratchpads.

Some architectures provide some means to lock down a cache region in order to prevent its content to be flushed. This is important for some embedded system with real time constraint.