simple question on scan example from Data-parallel Algorithms...

Hello all.
I have an easy question on scan.

On page 23 of the pdf Data-parallel Algorithms & Data Structures (http://www.gpgpu.org/sc2007/SC07_CUDA_4_DataParallel_Owens.pdf) they talk about compaction. They give an example where we want to compact the vector
[1 0 1 1 0 0 1 0] associated with
[A B C D E F G H]
to the vector
[0 1 1 2 3 3 3 4],
which somehow helps us obtain [A C D, G], the non-null elements. They then show how to do a scan. OK. But after the scan, how does having the vector [0 1 1 2 3 3 3 4] help us to pull out the non-null elements? I’m sure I’m missing something simple here.

John

[0 1 1 2 3 3 3 4] means “the starting offset (in global memory) for each thread to ouput”.
a b c d e f g h
0 1 1 2 3 3 3 4
1 0 1 1 0 0 1 0
Now the threads scans the “starting offset array”, but ignore those flagged 0. so, they’ll output “a” at offset 0, “c” at offset 1, “d” at 2, “g” at 3. You may see my ppt at http://forums.nvidia.com/index.php?showtopic=38425.
Prefix sum is a popular concurrent control technique, to avoid write conflict.
hih.