sequential memory read and write in a thread block


I need to acess same global memory address in order to read and write data in many treads in a block.

Is there any way to access the address sequentially?

Thank you,

are you talking about sequential consistency or linear indexing? For the latter, it’s just e.g. arr[blockIdx.x * blockDim.x + threadIdx.x] for each thread to get one element in a 1-D block.

Thank you for your reply.

For example,

in several threads in a block access same memory address like:


If the threads simulataneously read and write elf[3], it causes a problem. So, I’m looking for some trick which enables each thread access this address sequentially.

If you can do a parallel reduction, this is probably the best alternative.

If there are just a few elements, then you could have each thread broadcast its value to a shared array, syncthreads, and then have one thread update elf[3] for example.

atomic ops are also reasonably fast on the gt200 hardware if you don’t need incredible performance.

last, if you have threads which only occasionally conflict, you can try to communicate a write order between the threads (either explicitly through shared memory or implicitly through a thread-local calculation), and syncthreads between every write step. i have an example of that in one of my repos…bit_util.h#l284, but admittedly it’s not the easiest to read. I’m writing variable length integers (between 1 and 8 bits) to an array, and each thread gets a write index corresponding to its offset within a byte (equal to the number of other threads writing into the same byte with offsets lower than it). the important general idea is in the for loop on line 323 where each group of writes executes and then syncs.

feel free to im me [gatoatigrado at gmail] if I’m not so clear explaining.