I have a large sparse binary vector. It has between one million and ten million elements. Approximately half the elements are one and half are zero. I would like to create a new smaller vector where each element is an index into the non-zero elements of the sparse vector. For example:
Input vector: [0, 1, 1, 0, 1, 0, 0, 0, 1, 0] Output vector: [1, 2, 4, 8]
This might be a common problem with SIMD programming. The binary vector corresponds to the result of some if-conditional applied to a vector of data and I need the indices so I can easily process those in the next step. How can I do this efficiently in CUDA? Is there a common name for this?