Sequence Length in Smith Waterman

i got a problem with smith waterman algorithm on cuda. i need to make this algorithm work with large sequences… and CUDASW++ (found on tells me that there is a restriction 59K. i understand that this is because of architecture limitations, but the question is - is there anything that can help me to solve my problem?

That’s not a hardware limitation, that’s just that code’s method itself. It’s algorithmically possible to make unlimited matches with an improved algorithm.

The important concept for long matches is to not try to hold the entire dynamic program matrix in memory at once… that takes 4*n^2 bytes so it can rapidly eat your entire device memory for even modest sequences.
For local matching, this is easy to avoid since the match neighborhoods are limited (by the problem definition) but for global matching usually you need to do basic algorithm changes like use the Four Russians technique, or a sparse representation of the whole matrix. (I had decent success with this.)

Another method is to use iterative block updates, which is computationally expensive but memory efficent… basically iterate a block of the matrix to convergence then throw it out except the border values. Block to block intercommunication is via the border entries. When a border changes, you re-iterate the entire block again to propogate the changes. Again, computationally expensive but memory friendly.

CUDAlign is an implementation that does unlimited sequences comparison.

This is the paper that discuss about it.

The main idea of it is to process many separate blocks, as stated by SPWorley.

If you have questions, you can ask me.