L2 transfer overhead (profiler bug?)

Since I have already formulated this on SO, allow me to paste the link to the question: http://stackoverflow.com/questions/34455643/cuda-l2-transfer-overhead

The awesome Scott Gray @scottgray76 has pointed out to me on Twitter that this might be a bug in the profiler and that those numbers could be bogus.

Does anybody see this differently? Am I missing something here?

Thank you