Write-Combining memory can slow down your application?

In NVIDIA_CUDA_ProgrammingGuide_2.3.pdf, it says that " write-combining memory is not snooped during transfers across the PCI Express bus, which can improve transfer performance by up to 40%."

But when I run the bandwidthTest.exe , it turns out that the write-combining memory is no help in improving transfer performance. The host to device bandwidths for pinned memory are totally same.
( 1st test: “bandwidthTest -memory=pinned -wc”, 2nd test: “bandwidthTest -memory=pinned”)

And the worse thing is that the bandwidth will get slower if you use write-combining memory when copying data from host pageable memory to host write-combining memory. So when you use write-combining memory, the whole application performance is degraded.

Is this result reasonable?

Well, when it’s write combined, it’s not cached on the CPU, so it makes sense that the host side bandwidth to the buffer in question would be reduced. Thus, it makes the PCIe transfer faster at the expense of CPU access time.

But according to my test, the PCIe transfer speed does not change whether the host memory is write combined or not. When should we use write-combining memory ? By the test result, it seems useless.

Depends on the chipset.

Thank you , tmurray!

What kind of chipset will unleash this potential ? Mine is Intel 5520 chipset.

Which chipset does it depend on , GPU chipset , main board chipset or both?