Slow memcpy performance in dual-CPU, 10 GPU system

The transfer you are now showing is a pageable transfer. Those place “additional” stress on the host memory subsystem and may have other issues that impact measurement and performance. Its recommended to used pinned buffers/transfers for best transfer throughput, and its possible that the pageable transfers are contributing to measurement variability and impacting performance.