Latency of write and cacheline utilization

Olumide · February 26, 2018, 10:29pm

One of the kernels that I am developing computes and then writes data to global memory.

Because it is possible to know whether the data written is “redundant” (and thus will be discarded) I’m considering a possible optimization in which redundant data is not written to global memory.

I know that CUDA memory access in a cache line are coalesced. But I don’t know if the number of bytes to be written in the cache line has any impact on the latency of the write.

Does the proportion of bytes (in a cache line) to be written to global memory have an impact on the latency of a write?

Topic		Replies	Views
How do GPUs "handle" writes? CUDA Programming and Performance	12	3955	March 10, 2018
Time to write to global memory CUDA Programming and Performance	5	2151	November 8, 2016
Latency for writes to global memory CUDA Programming and Performance	5	3407	July 24, 2009
Poor Global Memory Write Speeds CUDA Programming and Performance	2	805	February 3, 2014
Coalescence of global memory reading and writing CUDA Programming and Performance	1	534	May 12, 2018
shared memory latency CUDA Programming and Performance	7	6050	May 18, 2011
Global memory write cost CUDA Programming and Performance	4	7994	March 11, 2011
Speeding up memory writes CUDA Programming and Performance	5	3309	July 3, 2008
global reading vs writing latency CUDA Programming and Performance	3	3757	March 23, 2007
How to write efficient from local to glocal memory Writing - time problems CUDA Programming and Performance	3	5590	December 5, 2007

Latency of write and cacheline utilization

Related topics