Matrix Normalisation

Part of the work I need to do involves taking a large regular 2d grid (a probability density function) and normalise the values (scale each so that the sum is 1). Is there any examples anyone could point me too about fast methods of summation over a large set of numbers? Is there a good way of making this algorithm parallel? I assume that once the scaling factor is found, the process of dividing each element by that factor is simple and efficient for the GPU.



I think the reduction example from the SDK is a good starting point for this.

Thanks for pointing me in the right direction.