2D Coalesced access pattern

I am not sure I understand the API guide regarding coalesced access pattern to 2D arrays.

Suppose I have a 2D array of 512x512 in global memory.

I want some of my blocks to do the following:

  1. read a single column from the 2D array to the shared memory.
  2. process it and write back the result column to the 2D array in same location.

While other blocks do:

  1. read a single row from the 2D array to the shared memory.
  2. process it and write back the result row to the 2D array in same location.

Is it possible to do the read/writes to/from columns/rows in a COALESCED manner?

I believe you can only access either rows or columns in a coalesced manner.
To achieve good performance you could try to subdivide your array and let each block read a subarray in which the reads and writes would be partially coalesced. It should be faster than the naive approach.

You could also try to use a texture; maybe due to the texture cache you can gain speedup over scattered reads.

Look at the transpose example in the SDK to see how to coalesce in both reads and writes.

You can also download this presentation, go to page 35 for a good explanation on how it works.
[url=“Mike Giles - Home Page”]http://web.comlab.ox.ac.uk/oucl/work/mike....tion_Harris.pdf[/url]

Thanks for the link!

I am not sure that the approach in the transpose example will help me.
I need to read one colum in one block and one row in a different block (actually it can be more than one but the rows/columns are not adjacent).
If I understand correctly, the “transpose” example approach can work if I read say 16 columns at a time. I will not have enough shared memory to store 16 rows.
Did I missunderstand the example? Is there another way to do it?