Lookup Table

I’m using deep learning to learn embeddings for word vectors. I’m trying to create a lookup table where I can ask for a number of indices say {3,7,8} and retrieve vectors at those indices of the table as fast as possible. Any ideas/references you can direct me to?

I find code here: https://github.com/facebook/fbcunn/blob/master/src/LookupTableGPU.cu but I don’t really understand what it’s doing.

i suppose the primary factors would be:

a) lookup table depth
b) number of threads participating, or simply the number of retrievals

that should guide the optimal design i would think