I’m new to Cuda and really love it so far. I have no idea why everyone isn’t using it so widely but really feel excited every time I run code on it and don’t have to grab a coffee for it to complete(My caffeine consumption has gone down alot thanks to Cuda!). So thanks to this community for helping me learn so much.
My question is regarding creating a hashmap on Cuda. I have seen the thrust-extention hashmap, which didn’t work and I am trying the Cudpp hashmap module as well but its not going so well(I also tried the example in “cuda by example” in the appendix which was okay but not sure if it totally fits my needs because it creates a hashmap in cuda while I only need to read it).
Basically I’m trying to value a very very large batch of stock portfolio as quickly as possible. I get several million portfolio constantly that are in the form of two lists. One has the stock name(char*) and the other has the weight(int). I then use the stock name to look up a hashtable to get other data(daily value, % change,etc…) and then process it based on the weight. On a CPU in plain C it takes a while so I am interesting in trying it on a GPU. I have read and done the examples in cuda by example so I believe I know how to do most of this except the hash function(there is one in the appendix but it seems focused on adding to it while I only really want it as a reference since it’ll never change. I might be rough around the edges in cuda for example so maybe there is something I’m missing that is helpful for me in this situation, like using textual or some special form of memory for this). How would I structure this for best results should each block have its own access to the hashmap or should each thread or is one good enough for the entire GPU?
The way I was designing this was to have each thread process a portfolio and create as many threads as I can. The problem with this is that all threads will need this information to reference is there a way to have it accessible to all threads without reducing the performance? Both lists(name/weight) have about 5,000 items each but other than that my program really requires no other memory(I will save the results to the host memory directly).
Can anyone advice me on if its possible to have a high performance read only reference hashmap? Any ideas or suggestions would be great, or even showing me a similar problem so I can learn about it and apply it to my specific issue?
p.s. sorry if I’m not explaining anything right, I’m still learning so let me know if anything isn’t clear!