I have some c code like this:
int m=0;
for (int i=0; i < num_loop; i++)
{
if (condition)
addr[m++] = i;
}
how can I parallelize it using a kernal for the “m” is not the same as “i”
I have some c code like this:
int m=0;
for (int i=0; i < num_loop; i++)
{
if (condition)
addr[m++] = i;
}
how can I parallelize it using a kernal for the “m” is not the same as “i”
This operation is called “stream compaction”. Thrust e.g. implements it in it’s copy_if methods.
The main operation needed to implement it is called a “prefix sum”. Google finds several good texts on both.
Thx, your answer really helps me a lot.
Thank you again for your answer. I try to parallelize the method today but one problem occurs when the “if condition” is a function of i and m.
int m=0;
for (int i=0; i < num_loop; i++)
{
if (func(i, m))
addr[m++] = i;
}
how can I implement with “prefix sum”?
That would need some deeper thought but my immediate intuitive reply would be that this seems impossible to do in a work-efficient way in the general case.
Can you be more specific about func(i, m)?
Hello tera:
I have the Code shows as below:
int input[INPUT_SIZE] = [0 , 0 , 1, 2, 2, 3, 3, 3, 4, 5, 5, 6, 6, 7, 8,...];
int m = 1, Sm = 0;
for (int i =0; i < num_loop; i++)
{
m = 1;
while (Sm + m < num_loop && input[Sm] == input[Sm + m])
m++;
Sm += m;
output[i] = Sm;
}
The input is a sequence of elements, every one of them repeates n times, n > 1 and n < INPUT_SIZE.