I have a NxN matrix. I have to update each element using a formula that needs the previous elements.

For example, to compute a value of a[i][j] I need a date value of a[i-1][j] and a[i][j-1].

How is it possible to parallelize a case of this kind?

I have a NxN matrix. I have to update each element using a formula that needs the previous elements.

For example, to compute a value of a[i][j] I need a date value of a[i-1][j] and a[i][j-1].

How is it possible to parallelize a case of this kind?

There are at least two general strategies.

- Process elements in the diagonal in parallel, one diagonal at a time.
- Process several diagonals in parallel by redundantly computing results, similar to a carry-lookahead adder.

Both of these assume a top-down partitioning of the work, and the downside is the global synchronization needed between each step.

You may also be able to make most of the synchronization local with a bottom-up partitioning of the work, but it only works if your operations are commutative and you can split your operation into separate downsweep and upsweep phases (see [1] for more details). You would need to take that idea and extend it to 2 dimensions.

[1] Section 3.4 of Duane Merril’s work on Scan - http://back40computing.googlecode.com/svn-history/wiki/documents/ScanTR2.pdf