for a school project i am implementing a DSP algorithm in cuda. At this point i had the algorithm working, but the program does not utilize shared memory. in other words, it reads and writes data directly to global memory.
now i understand that there’s no implicit caching for global memory, if i want to leverage the data access locality by caching data in shared memory, i will have to write custom codes manage the cache.
So, i am wondering, has anyone written a such cache manager? This cache manager can be something that mimic the cache controller of CPUs, in which every access memory access go thru this cache manager, the manager checks whether the data is in cache, if it is then just return data in the cache, otherwise automatically manage the miss by loading data from global memory. Obviously there are other things to consider such as conflict management and what not.
Obvisouly it’s not hard to write one myself, but it seems to be a pretty common task for a lot of cuda projects, so i thought i should ask before i go reinvent the wheel.