How can a kernel be both cache bound and not memory bound?

laughingrice · October 31, 2011, 10:38pm

I’m playing around with analyzing my kernel and as a start I checked how L1 cache bound it is and how global memory bound it is.

Changing L1 cache between 16KB and 48KB produced a 20% speedup in favor of 48KB, which seems to indicate that the kernel is memory bound (or at least cache bound).
On the other hand, reducing memory clock by 30% while leaving the graphics clock the same (reduce memory performance with respect to compute) has shown no difference in run time. So the kernel seems to be completely compute bound.

How do these two parameters fit together? The kernel should either be bandwidth bound or not bandwidth bound. Only thing I can see is that it is very L2 intensive but not global intensive, can that be the case?

Thanks