I am new to the cuda gpu space and I am trying to run a search loop more efficiently using cuda, I understand that the nvidia gpu has many more cores than a cpu. How would I go about maximizing core processing until I find a search match? Is it possible to run copies of the python script on multiple cores? I also understand that the python script has to be vectorized to machine language to facilitate the gpu’s full potential. Please enlighten me as to how this is achieved as well as any examples of this sort of programming you may be privy to. Thanks in advance.
You might want to be a bit more specific as to what you are trying to accomplish. Obviously there is no such thing as the search. Aspects to consider (off the top of my head):
(1) Are you looking for an exact match, approximate match, match with wild cards, general regular expression match?
(2) Are you looking only for the first match (in text order) or all matches?
(3) How many different symbols are these strings composed of? E.g. 4 for DNA, 26 for English alphabetic letters, any of 256 different byte values, any of 232 different words as in UTF-32?
(4) Is the length of the strings known in advance, or are string-terminating encodings being used as in C/C++?
(5) On average, how long (in symbols) are the strings being searched in?
(6) On average, how long (in symbols) are the strings being searched for?
(7) On average, how many strings are being searched in?
(8) On average, how many strings are being searched for?
Have you searched the literature? The following came up in the first few hits in a Google search:
Kaldewey T, Hagen J, Di Blas A, Sedlar E., “Parallel search on video cards.” In First USENIX Workshop on Hot Topics in Parallelism (HotPar’09) 2009 Mar 30.
Google Scholar counts 74 citations for this publication, so presumably there is some interesting and relevant follow-up work. I see one follow-up by the authors themselves:
Kaldewey T, Di Blas A, “Large-scale GPU search.” In GPU Computing Gems Jade Edition, Morgan Kaufman 2012, pp. 3-14
Hi, thanks for the reply. Ok, I am working on a password recovery application. And I want to increase the efficiency hexidecimal mactches to a list of common words and integers… The application loops through generator and matches the generated hexidecimal possibility to a list. if there is no match it repeat the process until there is a match in python. I was wondering if there is a way to optimize this process utilizing the thousands of gpu cores to run multiple versions of the script , there by generating thousands of possibilities and checking them per minute. That is the “jist” of what I am trying to do.
The generated and search strings are about 44 characters long.
The search list is about 500 strings.
Thanks in advance.
There is a password “recovery” tool called
hashcat that includes GPU acceleration. It is hosted on GitHub. Whether it offers a Python interface I do not know.