Suppose I have a large array in device memory. Each byte in the array is a bit mask of 1’s and 0’s. Most bits are 0, 1’s are random and fairly rare (on average, say, only 1 out of 1000 bit is set.)
I need to convert this into a sorted list that contains positions of set bit. This conversion is a major bottleneck in my program.
I’ve been able to cook up a function that gets up to about 15 GB/s on my video card (GTX 560), but I’m wondering if there’s a standard solution, or maybe something in thrust, that can do it faster.