How to convert this code to GPU

I have some c code like this:

int m=0;
for (int i=0; i < num_loop; i++) 
{
   if (condition)
   	addr[m++] = i;
}

how can I parallelize it using a kernal for the “m” is not the same as “i”

This operation is called “stream compaction”. Thrust e.g. implements it in it’s copy_if methods.
The main operation needed to implement it is called a “prefix sum”. Google finds several good texts on both.

Thx, your answer really helps me a lot.

Thank you again for your answer. I try to parallelize the method today but one problem occurs when the “if condition” is a function of i and m.

int m=0;
for (int i=0; i < num_loop; i++) 
{
   if (func(i, m))
   	addr[m++] = i;
}

how can I implement with “prefix sum”?

That would need some deeper thought but my immediate intuitive reply would be that this seems impossible to do in a work-efficient way in the general case.

Can you be more specific about func(i, m)?

Hello tera:
I have the Code shows as below:

int input[INPUT_SIZE] = [0 , 0 , 1, 2, 2, 3, 3, 3, 4, 5, 5, 6, 6, 7, 8,...];
int m = 1, Sm = 0;
for (int i =0; i < num_loop; i++)
{   
    m = 1;
    while (Sm + m < num_loop && input[Sm] == input[Sm + m]) 
        m++;
    Sm += m;
    output[i] = Sm;
}

The input is a sequence of elements, every one of them repeates n times, n > 1 and n < INPUT_SIZE.