Just as reference, I see a diagram, figure 119, page 2261, section 32.0 of the Tegra K1 TRM. Is this the diagram you are looking at?
I cannot answer your question, but perhaps this will help. The Tegra K1 TRM, section 188.8.131.52, describes bit 11 of device control and device status registers. It seems that there are conditions under which the no snoop bit may be set, which implies hardware enforced cache coherency is no longer enforced. I have no idea under L4T as to whether this bit has been set or not, but it seems cache coherency may be hardware enforced if certain bits are not set.
Further related to this, bits 23 and 27 seem important. In the case of bit 27, apparently setting the bit non-coherency. Again, I do not know what these bits are set to under L4T, but it seems this could be a way of enabling/disabling what you need. Someone else would need to comment on how these are currently set and how changing those bits might break something else.