 # Longest Common Subsequence in CUDA

Hi,

it’s longest common subsequence problem.

I have an N x M matrix of integers (which is the result of the dynamic algorithm in LCS).

I’m implementing an algorithm which requires the following to be performed:

There are M+N anti diagonals in this matrix. (anti - diagonals are parallel lines from top edge to left edge and right edge to bottom edge)

I need a for loop with M+N iterations with each iteration computing one anti-diagonal starting from the top left and ending at bottom right.

In each iteration, all the elements in that anti-diagonal must run parallelly.

Each anti-diagonal is calculated based on the values of the previous anti-diagonal. (S[i][j] depend on S[i][j-1], S[i-1][j] and S[i-1][j-1])

this is the code written in C language.

for (i=0;i<N+1;i++)
C[i]=0;
for (j=1;j<M+1;j++)
C[j]=0;

for (i=1;i<N+1;i++)
for(j=1;j<M+1;j++)
{
if (X[j-1]==Y[i-1])
C[i][j]=C[i-1][j-1]+1;
else
C[i][j]=max(C[i-1][j],C[i][j-1]);
}

X and Y are string (char*) C is matrix of integer N+1*M+1

So, how do I index the threads with this requirement in CUDA?

p.s. I’m very new to the whole CUDA programming topic and mostly new to parallel processing so please bear with me Hi,

it’s longest common subsequence problem.

I have an N x M matrix of integers (which is the result of the dynamic algorithm in LCS).

I’m implementing an algorithm which requires the following to be performed:

There are M+N anti diagonals in this matrix. (anti - diagonals are parallel lines from top edge to left edge and right edge to bottom edge)

I need a for loop with M+N iterations with each iteration computing one anti-diagonal starting from the top left and ending at bottom right.

In each iteration, all the elements in that anti-diagonal must run parallelly.

Each anti-diagonal is calculated based on the values of the previous anti-diagonal. (S[i][j] depend on S[i][j-1], S[i-1][j] and S[i-1][j-1])

this is the code written in C language.

for (i=0;i<N+1;i++)
C[i]=0;
for (j=1;j<M+1;j++)
C[j]=0;

for (i=1;i<N+1;i++)
for(j=1;j<M+1;j++)
{
if (X[j-1]==Y[i-1])
C[i][j]=C[i-1][j-1]+1;
else
C[i][j]=max(C[i-1][j],C[i][j-1]);
}

X and Y are string (char*) C is matrix of integer N+1*M+1

So, how do I index the threads with this requirement in CUDA?

p.s. I’m very new to the whole CUDA programming topic and mostly new to parallel processing so please bear with me it is very important for me

it is very important for me

The best algorithm critically depends on the size of N and M, and also whether you’re doing this just once with a pair, or perhaps testing one string with many others, or with many independent pairs.

One basic strategy is iterative blockwise dynamic programming with imperfect edge communication… you iterate (perhaps redundantly) until each block converges,
This is a strategy that can work for Smith Waterman string matching (which is similar to LCS computation.)

The best algorithm critically depends on the size of N and M, and also whether you’re doing this just once with a pair, or perhaps testing one string with many others, or with many independent pairs.

One basic strategy is iterative blockwise dynamic programming with imperfect edge communication… you iterate (perhaps redundantly) until each block converges,
This is a strategy that can work for Smith Waterman string matching (which is similar to LCS computation.)

I will test for many pair of string (with length from 1000 to 10000 char)

the smith-waterman appraoch uses a system of scoring to get string matching and he compares the sequence with a database which isn’t my case.

my problem is how may i fix the block_size and the index of blocks to fil the matrix (which i have to get as result) by crossing it anti-diagonal by anti-diagonal (where the parallelism exists in my problem)

I will test for many pair of string (with length from 1000 to 10000 char)

the smith-waterman appraoch uses a system of scoring to get string matching and he compares the sequence with a database which isn’t my case.

my problem is how may i fix the block_size and the index of blocks to fil the matrix (which i have to get as result) by crossing it anti-diagonal by anti-diagonal (where the parallelism exists in my problem)