Global memory coalescing for CHAR type

Gaszton · September 5, 2011, 8:11am

Hello!

my gpu is a geforce 425m, compute capability 2.1

i have a 1D data of char in global memory:

unsigned char* cdata;
cudaMalloc(&cdata,csize);

and a kernel launch with as many number of threads as many elements cdata have,
every thread writes a char type data to cdata[i],
i is equal to the index of the thread (in my case: blockIdx.x*blockDim.x+threadIdx.x)

Is the global memory access coalesced for char type?

For every warp there will be 32*1byte data to write to the global memory.
Therefore i configured the global memory acces cache to use only the 32byte sized L2 cache.
Am i thinking correctly that this should be faster than the use of L1 and L2 cache both?

Thanks,

Gaszton

Topic		Replies	Views
Coalesced Access to Global Memory CUDA Programming and Performance	2	1864	April 13, 2012
global memory caching CUDA Programming and Performance	4	1383	March 13, 2012
Is cache access coalesced? CUDA Programming and Performance	4	2000	September 5, 2016
Global memory broadcasting? CUDA Programming and Performance	4	5702	October 2, 2008
performance for global and shared memory CUDA Programming and Performance	2	6232	January 15, 2008
About global memory CUDA Programming and Performance	0	1918	October 19, 2008
Is global memory access cached, at least a little? global memory access CUDA Programming and Performance	4	3164	September 17, 2007
Access Global memory from kernel CUDA Programming and Performance cuda	2	626	December 15, 2020
Global memory alignment and coalescing CUDA 1.1 compatible CUDA Programming and Performance	2	1699	October 20, 2008
global mem reads coalesced per block or warp? CUDA Programming and Performance	5	5495	March 6, 2007

Global memory coalescing for CHAR type

Related topics