I have some c code like this:

```
int m=0;
for (int i=0; i < num_loop; i++)
{
if (condition)
addr[m++] = i;
}
```

how can I parallelize it using a kernal for the “m” is not the same as “i”

I have some c code like this:

```
int m=0;
for (int i=0; i < num_loop; i++)
{
if (condition)
addr[m++] = i;
}
```

how can I parallelize it using a kernal for the “m” is not the same as “i”

This operation is called “stream compaction”. Thrust e.g. implements it in it’s copy_if methods.

The main operation needed to implement it is called a “prefix sum”. Google finds several good texts on both.

Thx, your answer really helps me a lot.

Thank you again for your answer. I try to parallelize the method today but one problem occurs when the “if condition” is a function of i and m.

```
int m=0;
for (int i=0; i < num_loop; i++)
{
if (func(i, m))
addr[m++] = i;
}
```

how can I implement with “prefix sum”?

That would need some deeper thought but my immediate intuitive reply would be that this seems impossible to do in a work-efficient way in the general case.

Can you be more specific about func(i, m)?

Hello tera:

I have the Code shows as below:

```
int input[INPUT_SIZE] = [0 , 0 , 1, 2, 2, 3, 3, 3, 4, 5, 5, 6, 6, 7, 8,...];
int m = 1, Sm = 0;
for (int i =0; i < num_loop; i++)
{
m = 1;
while (Sm + m < num_loop && input[Sm] == input[Sm + m])
m++;
Sm += m;
output[i] = Sm;
}
```

The input is a sequence of elements, every one of them repeates n times, n > 1 and n < INPUT_SIZE.