Hi,
I’m looking for a particular algorithm, which I’ve not had much luck searching for.
The principal idea is that I have data arranged such that there are a number of objects stored in bins:
0 1 2 3 4 5 6 7 8 9 10

 2  5  2  1 2 3
This data can only be accessed from one big contiguous block, which I have used a parallel radix sort on the input data to organise it like so:
2 2 4 4 4 4 4 6 6 8 9 9 10 10 10
Now in order to process this data in parallel (this part is distributed), I need to generate the start indices of this data, which should look something like this (apologies if there are any typos in this data):
0 1 2 3 4 5 6 7 8 9 10
0 0 0 2 2 7 7 9 10 11 13
This kind of feels like an inverse stream compaction and I’m fairly sure the key to this is a scan of some sort, but I trying to figure out the exact algorithm that might be involved.
I’ll possibly figure this out moments after I’ve posted this, but any suggestions would be most welcome!
Thanks,
Dan