Cache information about Jetson TK1 GPU

Hi all! I want to know about the detailed cache configuration for GPU (Kepler gk20a). As far as I can find in the literature, it has 2 levels of caches:

L1: configurable with shared memory, in total 64KB, with cache line size 128B, associativity of 4.
L2: 128KB size

I could not find any more information for L2 cache, like cache line size, associativity. I tried several microbenchmarking tools to try to get this information, but the results are not matching.
(Tools: [url]http://www.stuffedcow.net/research/cudabmk[/url], [url]http://www.comp.hkbu.edu.hk/~chxw/gpu_benchmark.html[/url])

Can anyone help me with this? Is there a documentation for detailed L2 cache configuration for Jetson TK1 GPU?

A very vague observation, but perhaps relevant. Caching will differ on a GPU connected directly to a memory controller (Jetsons) versus over a PCIe bus (most of the common desktop products). In the case of the Jetson I believe setting up pinned memory for the GPU may disable some caching operations…in that case I would expect odd caching experiment results if that cache only exists for part of the memory. I have no details about where the cache would be used versus disabled.