similar string search possible with CUDA?

Louis_Coder · December 7, 2008, 1:27pm

Hi!

I want to do a similar string search.

->Is this possible using CUDA? (can you run such a stuff on a GPU?)

It would work in the following way:

-the user enters words, names, sentences, whatever.
-these words are split up into 3-grams (sub strings of the length 3).
-When something new is entered by the user, the new input is
split up into 3-grams, too, and a previous input with the most
matching 3-grams is searched.

Example:

PREVIOUS INPUT: hello dave
3-grams: hel,ell,llo,lo ,o d, da,dav,ave*
PREVIOUS INPUT: bye dave
3-grams: bye*,ye ,y d, da,dav,ave

CURRENT INPUT: bye grave

3-grams of the current input:
bye,ye ,e g, gr,gra,rav,ave

The second previous input would be chosen as similarest one as
it has 3 matching 3-grams (marked with *), the first input only 1.

Can this be implemented (ingeniously) with CUDA?
I could test many previous answers against the current input at once
using one core per previous input (I think ?!?).

Is there a better way to do the similar search
(using trees or so, if yes, how does this tree look like)?

Thanks a lot,

Louis

Louis_Coder · December 10, 2008, 3:55pm

Has really nobody an idea?
Come on, you are surely all cracks in what concerns CUDA?? ;)
I’m just a beginner.

Ailleur · December 10, 2008, 4:20pm

As long as you have multiple instances of the same exact problem, things should go smoothly. But you would need a lot (thousands+) of previous strings for it to be worth the effort (im guessing).

I would use one thread per previous input.

I could see some problems as to where to store the strings since you cant malloc inside a kernel. Unless you have an upper bound on the size of the input strings.

yummig · December 10, 2008, 4:44pm

Consider going down the string alignment path, you may have better luck. A similar thing has been done with the Smith-Waterman Algorithm in CUDA which is used for DNA alignment. Be careful of implementing trees in CUDA since they can grow quickly producing varied branching amongst threads, plus memory access patterns may not be consistent and so you will not be able to take advantage of memory coalescing.

Louis_Coder · December 10, 2008, 5:19pm

Aaah, now the answers come ;)
Thanks folks!

@Ailleur: ok, no problem, the string length is limited to 32 chars. I have a few million of them (5 M or so).
One thread per string - two hundred threads on a GTX260 - this gonna rock!!!

@yummig: could you please give me some details what a ‘string alignment path’ is? Maybe I know it and just don’t know
the (English) name. Sorry, I’m a newbie :unsure:

yummig · December 11, 2008, 9:17am

Sure, I gave a talk/lecture to some undergraduates recently regarding dynamic programming in this area, see the attatched PDF file. If you still get stuck just give me a shout. I hope this gets you closer to what you want to acheive
Essential_Algorithms.pdf (2.65 MB)

Reimar · December 11, 2008, 11:14am

Uh, certainly not. Two hundred threads is the minimum to get something useable, but I doubt it will be worth the effort.

Not sure about GTX260, but on GTX280 a really good value would be

30 processors * 8 blocks per processor * 64 threads per block = 15360 threads

If you want it to scale well with future hardware you are two orders of magnitude off.

Louis_Coder · December 12, 2008, 1:13pm

Uh, ok! ;)

I thought of 200 threads because this GTX260 is said to have around 200 stream processors.

On my Intel Core 2 Duo you can set up 2 threads, one per core (actually 4 I think because of hyper

threading). Correct me if I’m wrong.

I’ll better read the CUDA documentation before I continue talking…

Thanks, yummig, I’ll read your document…

E.D_Riedijk · December 12, 2008, 1:42pm

You will want (hundreds of)thousands of threads for high CUDA performance. In general writing CUDA programs while keeping the number of Multiprocessors in mind is flawed.

MisterAnderson42 · December 12, 2008, 1:48pm

CPUs and GPUs are very different architectures. As an extremely crude approximation, you can think each GPU “processor” as being massively hyperthreaded. Specifically, on the GTX 280, each MP (made up of 8 ALUs) can keep 1024 threads in flight. This interleaved execution hides the latency of memory accesses. It is all explained very nicely in the first few chapters of the programming guide.

Louis_Coder · December 12, 2008, 1:49pm

@yummig:

Ok I checked the document you sent me.
Maybe I have not understood it, but how can I use string alignment for a similar string search?
The aim of the string alignment is to find out what’s missing and what’s surplus within an input compared to a stored dataset.
But what if this dataset consists of a few million items?
I suppose creating the dynamic programming matrix for 5 million items is MUCH slower than comparing a few million 3-grams?
Did I not understand your document or am I right?

Topic		Replies	Views
String searching in CUDA CUDA Programming and Performance	4	5285	February 23, 2010
Looking for an example of simple word search in line of text CUDA Programming and Performance	9	3353	November 8, 2010
How do I use cuda to run a search\ while loop in python; non number crunching strings CUDA Programming and Performance cuda , kernel , python	3	913	October 31, 2021
First approach to cuda, need optimization or work are slow? CUDA Programming and Performance	4	848	December 11, 2013
BLAST (Gene Sequence alignment) Has anyone done it on CUDA CUDA Programming and Performance	13	5501	March 19, 2010
CUDA and strings CUDA Programming and Performance	0	938	March 27, 2018
Asking for an algorithm about massive data processing CUDA Programming and Performance	14	17558	January 29, 2010
Need a advance CUDA Programming and Performance	0	499	May 4, 2018
Help for sorting CUDA Programming and Performance	0	467	April 2, 2022
Some advices on improving performance? CUDA Programming and Performance	0	2660	May 12, 2008

similar string search possible with CUDA?

Related topics