Write Combined Memory How it enhences performance?

cudacuda2009 · March 24, 2010, 1:54pm

Hello,

In the CUDA 2.2 Pinned Memory APIs documentation (simpleZeroCopy SDK example), it is said that :
â€œWrites to WC memory are not cached in the typical sense of the word
cached. They are delayed in an internal buffer that is separate from the internal L1 and L2
caches. The buffer is not snooped and thus does not provide data coherency.â€
I am wondering if Writes to WC are not cached how would someone get performance?

Thanks

allanmac · March 24, 2010, 4:37pm

The performance comes from queueing up writes (combining) in order to maximize the throughput of the eventual write. This should sound similar to what’s described and encouraged in the CUDA docs.

This Intel doc from 11/1998 is a good place to start: Write Combining Memory Implementation Guidelines.

angavrilov · March 24, 2010, 5:11pm

Actually, re-reading that document carefully reveals that writes are cached in a way. Reads aren’t. This means that the CPU doesn’t have to monitor the PCI-E bus in order to keep its cache up to date, and therefore the bus can work faster. This makes sense if you use a buffer exclusively for host<->device transfer, e.g. as an intermediate for exchange between two GPUs. There might be other similar scenarios.

Topic		Replies	Views
Write-Combining memory can slow down your application? CUDA Programming and Performance	5	14310	January 15, 2010
Shared memory write performance CUDA Programming and Performance	6	779	April 18, 2017
question about page locked memory CUDA Programming and Performance	2	8854	April 21, 2009
Pinned Memory Performance CUDA Programming and Performance	0	856	February 9, 2011
shared memory latency CUDA Programming and Performance	7	5933	May 18, 2011
global memory caching CUDA Programming and Performance	4	1409	March 13, 2012
Batch write CUDA Programming and Performance	1	4848	September 22, 2008
Write masks and combiners? CUDA Programming and Performance	2	2971	September 2, 2009
Performance effects of pinned memory CUDA Programming and Performance	5	1005	January 27, 2011
Writing global memory 14 times slower than reading? CUDA Programming and Performance	6	10116	January 19, 2011

Write Combined Memory How it enhences performance?

Related topics