I’ve been looking into characterizing Fermi’s caches with a little microbenchmarking, inspired by the “Micro-benchmarking the GT200 GPU” paper and my own curiosity. Borrowing and adapting some of the code from that paper, I made a benchmark that could help find out cache characteristics - set associativity, hit/miss latencies. But the data I got was a little unexpected. The code as well as the charts generated are attached.
One thread steps through varying sized arrays with a set stride. Things I could confirm were the 128B line size, the two settings for L1 cache size, L2 size. I attached the graphs I made of the cache latencies. Strange behavior occurs when transitioning from L1 to L2. I was expecting a generally linear increase in average access times as the data array increasingly exceeds the L1 cache size. However, it plateaus off for a while before reaching what looks to be the L2 cache latency. This behavior occurs in both L1 cache configurations (16k and 48k) but at different sizes and intervals.
Some ideas that have been thrown around but nothing really convincing are:
possible prefetch - but why it takes that long to learn the access pattern is strange
effect of replacement policy - I haven’t thought this one through yet
a NUCA topology - locality might explain a plateau at 48k L1 Cache, but maybe not the fact that it also occurs at a smaller data size when using a 16kB L1.
So mainly, the fact that it occurs in both L1 cache configurations and with different intervals is what throws off these guesses.
Would anyone have an idea or better educated guess as to what is going on? It seems like, if some hardware is actually causing this plateau effect, it would be neat to figure out how to possibly utilize/exploit it.
Oh, this was tested on a GTX480. It may also be interesting to see what happens on other models.
Hopefully I’m not making any stupid assumptions or making mistakes in the code… and if interested in the script used to generate the graphs, it’s here: graph.py - linked b/c I can’t attach
l1cache.cu (4.83 KB)