I’m new to CUDA programming. I want to store some <key, value> pairs in the GPU memory. I don’t know the number of distinct keys in advance. While processing the input data my kernel will discover the keys, and I want to maintain the number of data elements belong to each key (Something similar to using the std::map container if I implement it in CPU). Is there any possible way of doing it in GPU.
If you don’t produce many results, the most efficient way probably is to allocate a fixed size array with plenty of space, and restart the kernel if the space isn’t sufficient. You’ll then use atomicAdd() to allocate space inside the array.
I looked into stream compaction, so, as I understood, what you have suggested is allocate space for all the possible values of the keys, then use atomicAdd() to get the number of elements for each key, and use stream compaction to get only the keys which have non-zero number of elements.
Problem with above approach is my “keys” range is very large (ex: 1 to 10^10), so I cannot allocate space for all the possible keys. Is there any possible way of maintaining only the keys which have non-zero number of elements (using thrust ??). I’m only interested in the number of elements for each key (no need to store which elements).
For example, if my input data is set of locations (mapped to integer keys), desired output is number of records for each location