I have a large sparse binary vector. It has between one million and ten million elements. Approximately half the elements are one and half are zero. I would like to create a new smaller vector where each element is an index into the non-zero elements of the sparse vector. For example:
This might be a common problem with SIMD programming. The binary vector corresponds to the result of some if-conditional applied to a vector of data and I need the indices so I can easily process those in the next step. How can I do this efficiently in CUDA? Is there a common name for this?
If you can manage to make the value of the non-zero elements identical to their position, you would have a standard stream compaction problem. GPUs have been successfully applied to that.
Your use case strikes me as a potentially common scenario, so there may already be code out there that does exactly what you need. I would start by looking what thrust has available in this regard.
Yes, I can do that. I can make the non-zero elements identical to their position. Do you think someone has this already programmed in CUDA? (I will check thrust)
I have never used stream compaction on GPUs myself, but I have seen multiple publications in the past ten years that discussed GPU implementations. If I recall correctly, this has come up a few times on these forums and on Stackoverflow as well, so you may want to just do a Google search to see what is already out there.
I mentioned the Thrust library because it is focused on array processing and makes for easy programming as I have heard from multiple people who had no prior experience programming GPUs.