Help with data structure need suggestions on alternatives to maintain a structure

let me give a little background on my problem.
I have a set of numbers whose values are pow(Z,R) Mod N.
Lets say N is 100 and Z,R are between 1 and 10. Now I have 100 values (1 to 100) and I want to store all Z,R pairs that correspond to a value between 1 to 100. For example:
1----> 1^1Mod100, 1^2Mod100, 1^3Mod100… (I need to store [1,1],[1,2],[1,3]…[1,10]…)
2---->2^1Mod100, …
3---->3^1Mod100, …

25---->25^1Mod100, 5^3Mod100, … (I need to store [25,1],[5,3]…)


So this is something like a hash map with keys from 1 to 100 and values that are lists of pairs of numbers.
I am hoping to do something like "get all z,r pairs for which z^rMod100 is 5.

Any suggestions on how to best maintain this structure on the device? (Hopefully with simple arrays).
There will be 1000s of threads trying to read this structure.

constant cache should work, what is whole size of a table? Or 2D texture if you do not go with fermi.

The whole size would come out to 100 (values of N) + (10*10) ( all combinations of Z,R from 1 to 10) = 200 floats/longs (this should eventually be a little scalable but will suffice as an example)

I dont care what type of memory i use (cache/texture/even global). I need to decide on a way of arranging the data on the device such that fetching is easy.

Can you elaborate a little on how I would use a 2d texture (or any 2d structure) for this?

If you can obey bank conflict constraints and the table fits into smem, consider putting it there. You can read 16 different addresses at a time from smem, but only one from cmem.

Edit: In my experience completely pseudo-random lookups (i.e. you can’t control read address at all- cryptography data) also perform better on smem then cmem

Again, I am not asking about what would perform best. I am still deciding on how i could put the data up there.

I am expecting something like:

Store values of N in an array of pointers. Store the Z,R pairs in array of float2s…

something like that.

Once i have a structure, I can decide on where to put what.

You need to elaborate “such that fetching is easy” then. You could do simply
“float modpow[100][10][10]”, and put -1 in it when there’s no value (since pow(x,y) MOD 100 will never be negative), then you can scan all through modpow[100][][] for positive values. The fetching “looks easy” to me, it is completely ineffective though.

I think at this point I need to give a more realistic example:

My N is 65521. Z is 100 and R is 10. What I am trying to do is, instead of looping through all possible values of Z^RMod N in EACH thread, I precompute all Z^R Mod N values and store them. Now each thread just has to do a “look up” to match with another result that they will compute - this look up would ideally be a hash (but i guess that is a little hard for me to implement right now)