BLAST (Gene Sequence alignment) Has anyone done it on CUDA

Hey guys,

I am working on implementing BLAST using CUDA. Has somebody already implemented it? I do not want to reinvent the wheel.


I am not in the topic, but have you tried googling it?
“BLAST CUDA”…1Lf10ParPgkkAkA

Respected Sir,

I had already(and obviously) googled it before posting it here. And the paper that you have directed me to is a Smithwaterman implementation on CUDA and is not about BLAST on GPUs. But any way it was nice of you to reply.



Was trying to help :(

I didnt want to lash out at you or anything. Sorry… just felt strange when you advised me to google it because that is what we all first do. :) No hard feelings. :)

You would be surprised…

Hey Sid,

If you get any updates, let me know. I’ve been thinking of tackling this one myself as I’m still relatively new to sequencing. Currently we have a summer student working on CUDA and it’s application to alignment tools. Not sure how far along he’s gotten though.


I’ve recently looked into this and it isn’t as straightforward as one might think. BLAST isn’t really an algorithm as a series of rather independent steps. In the standard NCBI implementation, most time is spent on simple (non-gapped) string matching, and I’m not sure it’s not faster on the CPU than it would take to just ship the data to the GPU. Later Smith-Waterman is used to align the HSSPs and the paper cited above shows how that can be done efficiently in cuda, but it’s only about 10% of total processing time in BLAST.

There are algorithms for string matching for GPU as well…4VH7FCniLF_hijQ


Its true that overhead of just transfering the database onto GPU may be huge but seeing the improving bandwidth of PCI-e and the speed-up that one may expect after the data has reached GPU, it is not a bad idea to port it onto the GPU. Or maybe some other algorithm which is better in terms of sensitivity than BLAST but not as fast may also be tried. Still looking into the possibilities.

Will surely update if there is any success.



I took a class in Bioinformatics a few years back. It’s not anywhere near my specialty (it was just an elective class), but I’m interested to read if you guys find (or write) any good string-matching code with CUDA.

I am curious as to how far you have gotten with your implementation with BLAST on CUDA? I have completed a variant of BLAST on the CUDA architecture and was curious as to how you attempted the same feat?

I look forward to your reply.


btw, SSE 4.2 (i7) has superior string matching instructions… directly useful for string matching, xml parsing etc…
Check out the Intel guide. One can use that to optimize this code heavily.

CUDA-BLASTP uses BLAST-like heuristics for protein alignments. I’m working on BASALT, which will seed alignments for large nucleotide sequences. What kind of alignments are you interested in?