it’s longest common subsequence problem.
I have an N x M matrix of integers (which is the result of the dynamic algorithm in LCS).
I’m implementing an algorithm which requires the following to be performed:
There are M+N anti diagonals in this matrix. (anti - diagonals are parallel lines from top edge to left edge and right edge to bottom edge)
I need a for loop with M+N iterations with each iteration computing one anti-diagonal starting from the top left and ending at bottom right.
In each iteration, all the elements in that anti-diagonal must run parallelly.
Each anti-diagonal is calculated based on the values of the previous anti-diagonal. (S[i][j] depend on S[i][j-1], S[i-1][j] and S[i-1][j-1])
this is the code written in C language.
X and Y are string (char*) C is matrix of integer N+1*M+1
So, how do I index the threads with this requirement in CUDA?
p.s. I’m very new to the whole CUDA programming topic and mostly new to parallel processing so please bear with me