Interpretation of Coalesced Global memory access for 3d Block Is it coalesced only if tid is used??

Vijaeendra · November 23, 2011, 6:41am

I understand that coalesced memory access can be used at best performance when tid(thread id) is used to access the data. But in the case of accessing linear memory (containing a 3D image data) how can coalescing be interpreted??

As in the following case.

unsigned int i = blockIdx.x * blockDim.x + threadIdx.x; 

	unsigned int j = blockIdx.y * blockDim.y + threadIdx.y;

	unsigned int k = blockIdx.z * blockDim.z + threadIdx.z; 

size_t index= i + (j * resX) + (k * frameSize);

        imageMap[index] = maxValue;

If the above access turns out to be an inefficient way , what other approaches can be followed??

Vijaeendra · November 23, 2011, 6:46am

I have gone through the strided Access section in the ‘CUDA Best Practices’ guide.
I need a deeper understanding of the same.

Awaiting a quick reply.

pQB · November 23, 2011, 9:55am

If you can imagine linearly aligned memory

// first slide of the cube

0 1 2 3 

4 5 6 7

8 9 10 11

12 13 14 15

// second slide of the cube

16 17 18 19

20 21 22 23

24 25 26 27

28 29 30 31

// linearly arranged

 0 1 2 3 4 ... 12 13 14 15 16 17 ... 28 29 30 31

and keep in mind that x is the fastest varying dimension (then y, then z). Your index scheme:

size_t index= i + (j * resX) + (k * frameSize); // resX = width; frameSize = number of slides.

has a coalesced pattern to the data.

which is satisfied in your example. Note there are other conditions and some of them have been relaxed for Compute Capability 1.2 and above.

I recommend see Section G.3.2 Global Memory of NVIDIA CUDA C Programming Guide, v.3.2.

Hope this help.

PS: Any review is welcome.

Topic		Replies	Views
memory accesses by thread block accessing memory by thread block is only semi-coalesced? CUDA Programming and Performance	7	3771	February 16, 2009
confusions about coalesce access CUDA Programming and Performance	3	4865	January 9, 2009
Coalesced access CUDA Programming and Performance	6	55	October 23, 2024
Handling 3d matrices CUDA Programming and Performance	3	9127	July 10, 2010
Coalesced Access to Global Memory CUDA Programming and Performance	2	1864	April 13, 2012
coalesced access to global memory block-wise access vs element-wise access CUDA Programming and Performance	0	1501	March 17, 2010
Coalesced memory access pattern in Image processing CUDA Programming and Performance	2	952	March 22, 2013
Coalescing memory accesses Need help with coalescing CUDA Programming and Performance	2	1163	March 30, 2009
Non-coalesced problem? CUDA Programming and Performance	2	3446	September 27, 2008
Isn't that Coalesced?! writing to global memory in a coalesced way CUDA Programming and Performance	9	10168	June 28, 2009

Interpretation of Coalesced Global memory access for 3d Block Is it coalesced only if tid is used??

Related topics