The code comments aplattner posted from nv-vm.c talks about change_page_attr(). Digging into arch/x86/mm/pageattr.c (3.0.x kernel), we find this function calls cpa_flush_*():
/*
* On success we use clflush, when the CPU supports it to
* avoid the wbindv. If the CPU does not support it and in the
* error case we fall back to cpa_flush_all (which uses
* wbindv):
*/
if (!ret && cpu_has_clflush) {
if (cpa.flags & (CPA_PAGES_ARRAY | CPA_ARRAY)) {
cpa_flush_array(addr, numpages, cache,
cpa.flags, pages);
} else
cpa_flush_range(baddr, numpages, cache);
} else
cpa_flush_all(cache);
Seems like this code is sensitive to CPU capabilities. It appears that cpa_flish_range/array() call a lighter-weight method of cache invalidation: clflush() (see arch/x86/include/asm/system.h) instead of wbinvdt().
If we assume Linux is working properly, then there is no need to flush the cache in nv-vm.c::nv_flush_cache(). NVIDIA says it doesn’t always work, hence the wbinvdt(). Unfortunately, nv-vm.c doesn’t give us any more information.
Here are the code comments in nv-vm.c from an older 27x-era driver. They’re a little different:
/*
* Cache flushes and TLB invalidation
*
* Allocating new pages, we may change their kernel mappings' memory types
* from cached to UC to avoid cache aliasing. One problem with this is
* that cache lines may still contain data from these pages and there may
* be then stale TLB entries.
*
* The Linux kernel's strategy for addressing the above has varied since
* the introduction of change_page_attr(): it has been implicit in the
* change_page_attr() interface, explicit in the global_flush_tlb()
* interface and, as of this writing, is implicit again in the interfaces
* replacing change_page_attr(), i.e. set_pages_*().
*
* In theory, any of the above should satisfy the NVIDIA graphics driver's
* requirements. In practise, none do reliably:
*
* - some Linux 2.4 kernels (e.g. vanilla 2.4.27) did not flush caches
* on CPUs with Self Snoop capability, but this feature does not
* interact well with AGP.
*
* - most Linux 2.6 kernels' implementations of the global_flush_tlb()
* interface fail to flush caches on all or some CPUs, for a
* variety of reasons.
*
* Due to the above, the NVIDIA Linux graphics driver is forced to perform
* heavy-weight flush/invalidation operations to avoid problems due to
* stale cache lines and/or TLB entries.
*/
Here, the comments state that the 2.6 kernel only needs a TLB flush. This implies to me that commenting out the call to the CACHE_FLUSH() macro in nv_flush_cache() should be safe.* I think this is a better solution than changing CACHE_FLUSH() into a noop. Why did the comments in nv-vm.c change? Did an engineer get overly zealous in cleaning up comments when AGP or 2.4 kernel support was dropped? Did NVIDIA learn of other instances where the 2.6 kernel also needed a cache flush? We’ll never know.
I think the best that you can do is register a bug with NVIDIA and hope that they task an engineer to reevaluate the situation. This is such a low-level and fundamental part of memory management that I could see NVIDIA being very (extremely) hesitant to making any official changes unless wbinvdt() starts to create serious problems for important customers (AAA games on Linux). I’m not surprised that you’ve hit a bug that relates to latency: the Linux driver hasn’t really had to support low-latency operations until recently. Hopefully SteamOS will help motivate a change.
In the meantime, I think the best that you can do is test your system with CACHE_FLUSH() commented out from nv_flush_cache() and hope for the best. I’ll test it out on my system as well.
- It should be safe, assuming that the nv_flush_cache() callee is calling nv_flush_cache() because it changed memory attributes via change_page_attr() or set_pages_*() and not for some other reason. This appears to be the case: We can limit the scope of code that needs to be reviewed to nv-vm.c since nv_flush_cache() is a static function.