The January 2015 paper Constant Buffers without Contant Pain  has two statements, that at first glance, seem at odds:
When testing with a GTX 970 (driver ver 352.86) on Win 7, we aren’t seeing a performance difference between Map/memcpy/Unmap and UpdateSubResource. However, on Win 10 (with a GTX 970) Map/memcpy/Unmap are consistently underperforming UpdateSubResource by 7-12% (which roughly match the perf difference in the papre).
Is there a technical reason for the performance differential between Win7/Win10? We’ve had internal speculation that UpdateSubResource may not actually be limited by the 128MB rename buffer.
Is there any scenario where Map/memcpy/Unmap will outperform UpdateSubResource for per-draw call constant buffer updates?