Segmented (or keyed) reduction

Hello all.

Does anyone have an idea on how I can implement a segmented reduction similar to what Thrust for CUDA does?

Example: http://thrust.googlecode.com/svn/tags/1.2…a3144c2e741838a