efficient parallel algorithm modifying large arrays of integers?

Hi all

I’m trying to come up with an efficient algorithm to do the parallel equivalent of the following serial algorithm (in pseudocode/C++):

[codebox]valFrom = Change integers with this value

valTo = Change integers that have “valFrom” value to this value

numToChange = number of integers with value “valFrom” to change

arraylen = length of the array

copy array of integers from device to host array named “array”

int count = 0;

for (int i=0; i<arraylen && count < numToChange; ++i)

{

if (array[i] == valFrom)

{

    array[i] = valTo

    count++;

}  

}

copy “array” host array back to device[/codebox]

Does anyone have any recommendations for turning this into a data parallel algorithm to run on the device? I was thinking something like a modified parallel reduction or perhaps sorting the integers into a new array and then assigning “numToChange” threads to the section of the array containing the relevant ints. But was just wondering if anyone has any better ideas?

Thanks!

nodlams