In NVIDIA_CUDA_ProgrammingGuide_2.3.pdf, it says that " write-combining memory is not snooped during transfers across the PCI Express bus, which can improve transfer performance by up to 40%."
But when I run the bandwidthTest.exe , it turns out that the write-combining memory is no help in improving transfer performance. The host to device bandwidths for pinned memory are totally same.
( 1st test: “bandwidthTest -memory=pinned -wc”, 2nd test: “bandwidthTest -memory=pinned”)
And the worse thing is that the bandwidth will get slower if you use write-combining memory when copying data from host pageable memory to host write-combining memory. So when you use write-combining memory, the whole application performance is degraded.
Is this result reasonable?