Insight on performance of GTX 480 for LIB benchmark


I am executing LIB benchmark on GTX480 in two scenarios. One is with L1 cache and other is bypassing L1 cache.
I found the following results

		                             Execution Time(ms)	

Benchmark GridSize BlockSize With Cache(L1) Without Cache(By passing L1)
LIB 64 16 14.179 14.223
64 32 7.315 7.34
64 64 4.47 4.464
64 128 4.491 4.506
64 256 4.593 4.6

I am not able understand why the execution time is more in case of running a benchmark bypassing L1 cache.
Any insights on this?


First of all, there doesn’t seem to be much difference (less than 1%, it seems).

Second, if the L1 cache is providing any benefit, and you bypass it, then you won’t receive that benefit. For a benchmark, it seems quite evident to me that this could make the code run slower, i.e. “the execution time is more”.

That doesn’t explain the 64,64 case, but I would suggest that these differences are so small that no real conclusions can be drawn.