Contant Bufffers - Map/Unmap vs UpdateSubResource

The January 2015 paper Constant Buffers without Contant Pain [1] has two statements, that at first glance, seem at odds:

  • Map is slower (256 cycles) compared to UpdateSubResource (214 cycles)
  • NVidia drivers map UpdateSubresource() to Map(MAP_WRITE_DISCARD)->memcpy()->Unmap()
  • When testing with a GTX 970 (driver ver 352.86) on Win 7, we aren’t seeing a performance difference between Map/memcpy/Unmap and UpdateSubResource. However, on Win 10 (with a GTX 970) Map/memcpy/Unmap are consistently underperforming UpdateSubResource by 7-12% (which roughly match the perf difference in the papre).

    Is there a technical reason for the performance differential between Win7/Win10? We’ve had internal speculation that UpdateSubResource may not actually be limited by the 128MB rename buffer.

    Is there any scenario where Map/memcpy/Unmap will outperform UpdateSubResource for per-draw call constant buffer updates?

    [1] https://developer.nvidia.com/content/constant-buffers-without-constant-pain-0