Hi,

it’s longest common subsequence problem.

I have an N x M matrix of integers (which is the result of the dynamic algorithm in LCS).

I’m implementing an algorithm which requires the following to be performed:

There are M+N anti diagonals in this matrix. (anti - diagonals are parallel lines from top edge to left edge and right edge to bottom edge)

I need a for loop with M+N iterations with each iteration computing one anti-diagonal starting from the top left and ending at bottom right.

In each iteration, all the elements in that anti-diagonal must run parallelly.

Each anti-diagonal is calculated based on the values of the previous anti-diagonal. (S[i][j] depend on S[i][j-1], S[i-1][j] and S[i-1][j-1])

this is the code written in C language.

for (i=0;i<N+1;i++)

C[i][0]=0;

for (j=1;j<M+1;j++)

C[0][j]=0;

for (i=1;i<N+1;i++)

for(j=1;j<M+1;j++)

{

if (X[j-1]==Y[i-1])

C[i][j]=C[i-1][j-1]+1;

else

C[i][j]=max(C[i-1][j],C[i][j-1]);

}

X and Y are string (char*) C is matrix of integer N+1*M+1

So, how do I index the threads with this requirement in CUDA?

p.s. I’m very new to the whole CUDA programming topic and mostly new to parallel processing so please bear with me