Compress Sparse Binary Vector, Get Indices of Non-Zero Elements

I have a large sparse binary vector. It has between one million and ten million elements. Approximately half the elements are one and half are zero. I would like to create a new smaller vector where each element is an index into the non-zero elements of the sparse vector. For example:

Input vector: [0, 1, 1, 0, 1, 0, 0, 0, 1, 0]
Output vector: [1, 2, 4, 8] 

This might be a common problem with SIMD programming. The binary vector corresponds to the result of some if-conditional applied to a vector of data and I need the indices so I can easily process those in the next step. How can I do this efficiently in CUDA? Is there a common name for this?

If you can manage to make the value of the non-zero elements identical to their position, you would have a standard stream compaction problem. GPUs have been successfully applied to that.

[0, 1, 1, 0, 1, 0, 0, 0, 1, 0] → [0, 1, 2, 0, 4, 0, 0, 0, 8, 0] → [1, 2, 4, 8]

Your use case strikes me as a potentially common scenario, so there may already be code out there that does exactly what you need. I would start by looking what thrust has available in this regard.

1 Like

Yes, I can do that. I can make the non-zero elements identical to their position. Do you think someone has this already programmed in CUDA? (I will check thrust)

I have never used stream compaction on GPUs myself, but I have seen multiple publications in the past ten years that discussed GPU implementations. If I recall correctly, this has come up a few times on these forums and on Stackoverflow as well, so you may want to just do a Google search to see what is already out there.

I mentioned the Thrust library because it is focused on array processing and makes for easy programming as I have heard from multiple people who had no prior experience programming GPUs.