I have a code in which input is a array and out is also the same array.
__global__ kernel_foo( char *chArray)
{
int count = 0;
for(int i=1; i<312; i++) // this loop is replaced by thread index but here for better understanding I use for loop.
{
if(chArray[i]==chArray[i+1])
{
// body here, count is also updated here
switch( count )
{
case 0:
chArray[x] =0;
break;
case 1:
chArray[x+1]=312;
break;
default:
chArray[x+1]=0;
}
}
else
{
chArray[x]=100;chArray[x+1]=150;
}
}
}
My question is :
In this code chArray used in decisions of if also updeted inside its body .When call this kernel I get wrong result.
You aren’t getting the correct result because threads run in non-deterministic order and you are overwriting values in the array. Why not just put the output into a new array, say chArray2? You can ping-pong the use of the array so that you can reduce extra copying. Ex:
Actually count is updated inside switch cases. In my early given code ,inside swich case 0 and case 1 reinitialize the value of count to 0 and default case it is incremented by 1. And also in else part count is incremented by 1( i have not mentained earlier because i want to show different thing ), and x =i above if condition.
Hi KUNDAN KUMAR
I really interesting with your function.
I read it many time, but I still can not understand clearly
Can you post some thing more easy to understand. like pseudoCode, as detail as possible.
:)