Fermi. Disable cache. Difference between cs versus. cv (volatile)?

How do you disable caching of data in L1 and L2 as much as possible ? (Fermi Architecture)
I saw that in the previous posts, you can disable cache L1, and disable caching in L2 as much as possible by doing the following.

-Xptxas -dlcm=cs

cs means cache streaming, cv means cache volatile.

Isn’t volatile supposed to degrade performance even more ?
Is the SM will try to fetch from memory every time ? or is it that SM happens to fetch from cache (memory…) and cache hit ?
since cv produces better performance than cs… unexpectedly…

Can anyone suggest how I can disable the use of L2 and L1 as much as possible ?
Thanks !