Hash map on Cuda

Hi Everyone,

I’m new to Cuda and really love it so far. I have no idea why everyone isn’t using it so widely but really feel excited every time I run code on it and don’t have to grab a coffee for it to complete(My caffeine consumption has gone down alot thanks to Cuda!). So thanks to this community for helping me learn so much.

My question is regarding creating a hashmap on Cuda. I have seen the thrust-extention hashmap, which didn’t work and I am trying the Cudpp hashmap module as well but its not going so well(I also tried the example in “cuda by example” in the appendix which was okay but not sure if it totally fits my needs because it creates a hashmap in cuda while I only need to read it).

Basically I’m trying to value a very very large batch of stock portfolio as quickly as possible. I get several million portfolio constantly that are in the form of two lists. One has the stock name(char*) and the other has the weight(int). I then use the stock name to look up a hashtable to get other data(daily value, % change,etc…) and then process it based on the weight. On a CPU in plain C it takes a while so I am interesting in trying it on a GPU. I have read and done the examples in cuda by example so I believe I know how to do most of this except the hash function(there is one in the appendix but it seems focused on adding to it while I only really want it as a reference since it’ll never change. I might be rough around the edges in cuda for example so maybe there is something I’m missing that is helpful for me in this situation, like using textual or some special form of memory for this). How would I structure this for best results should each block have its own access to the hashmap or should each thread or is one good enough for the entire GPU?

The way I was designing this was to have each thread process a portfolio and create as many threads as I can. The problem with this is that all threads will need this information to reference is there a way to have it accessible to all threads without reducing the performance? Both lists(name/weight) have about 5,000 items each but other than that my program really requires no other memory(I will save the results to the host memory directly).

Can anyone advice me on if its possible to have a high performance read only reference hashmap? Any ideas or suggestions would be great, or even showing me a similar problem so I can learn about it and apply it to my specific issue?

Thanks alot!

p.s. sorry if I’m not explaining anything right, I’m still learning so let me know if anything isn’t clear!

I would suggest one or two little changes:

[list=1]

Instead storing stock name (char *), you’d better store stock hash, or stock number on a table

Instead of doing 2 lists, you’d better have one list and read them together (ie 64bit read= 32bit int stock number + 32bit fp weight)

Ensure your list is a list (ie: length + data) instead a linked list

If you could do with only few fp for each stock, you may store them in constant memory to be cached (if not you are really in bad situation)

For myself I would consider doing it this way:

    Loop for each stock information kind I need to process

    Organize portfolio on single lists constitued of multiple of 32 entries

    Each entry containing 0|0 (not used), or stock# | weight

    Having each warp reading a list, 32 entries at once, until it reach the end (then it take the next portfolio to process) for coalesced global memory access

    Group stock informations, per kind, one array of 5000 entries per kind, to maximize cache efficiency. Eventually have them on constant memory depending on the architecture.

This way, the processing will be really straightforward, with huge coalesced global memory read, 256bytes at once, and stock information cached efficiently.

Last year I wrote a kernel for duplicate removal which used hashing.
Some of this is described in Technical Report RN/11/18 section 3.3
http://www-typo3.cs.ucl.ac.uk/fileadmin/UCL-CS/research/Research_Student_Information/RN_11_18.pdf
I also remember some very helpful comments in this thread…
http://forums.nvidia.com/index.php?showtopic=189165

                            Bill

    Dr. W. B. Langdon,
    Department of Computer Science,
    University College London
    Gower Street, London WC1E 6BT, UK
    http://www.cs.ucl.ac.uk/staff/W.Langdon/

CIGPU 2012 http://www.cs.ucl.ac.uk/staff/W.Langdon/cigpu
choose your background http://web4.cs.ucl.ac.uk/staff/W.Langdon/colour_telephone/bgcolor.html
A Field Guide to Genetic Programming
http://www.gp-field-guide.org.uk/
GP EM http://www.springer.com/10710
GP Bibliography http://www.cs.bham.ac.uk/~wbl/biblio/

RN_11_18.pdf (170 KB)