my problem is this one.
I have a matrix, say U, initialized to zero, that has to be filled in a particular way. If the element U(i,j) should be set to a value different from zero, then also the block sized (2K+1)x(2K+1) and centered on the considered element has to be set to values different from zero (K is a parameter that can be chosen). Accordingly if two elements of the matrix, that should be set, are distant less or equal to 2K+1 in coloumns or in rows, the relative blocks overlaps and in the overlapping positions the values should be the sum of the corresponding positions in the two blocks.
Due to the nature of the algorithm I chose this implementation. I defined a block of threads sized (2K+1)x(2K+1) and a properly sized grid of block that allows to cover all elements of the matrix that should be set. Each block of threads is responsible to evaluate the (2K+1)x(2K+1) sub-metrix and to put it into the final matrix, so each thread computes a value of the sub-matrix. I exploited a shared memory area sized (2K+1) x (2K+1) and after all the evaluation of the block of threads the sub-matrix is copied back to the global memory into the original matrix U.
But what happens when overlaps of sub-matrix occur? In particular, if two threads of two different block of threads should write into the same global memory area, say U(h,k), the last writing overwrites the other. For this reason I used a workaround solution, the thread READS the U(h,k) and immediately after (i.e. next instruction in the code) adds the computed value.
Nevertheless, I come now to another doubt. I’m wondering if the following situation could happen:
assume that U(h,k)=0;
TIME 0, THREAD A BLOCK X READS U(h,k)=0
TIME 1, THREAD B BLOCK Y READS U(h,k)=0
TIME 2, THREAD A BLOCK X WRITES new U(h,k)= U(h,k) RED + alpha = 0+ alpha=alpha;
TIME 3, THREAD B BLOCK Y WRITES new U(h,k) = U(h,k) RED + beta = 0+ beta=beta;
The final correct value of U(h,k) should be: alpha+beta, but the result is U(h,k)=beta;
Is it possible that the above described sequence of events could present? Is there a way to sync the threads of different blocks in order to be sure that if a thread acces to U(h,k) to READ the same thread MUST write the new value before other threads of other blocks READ U(h,k)?