Parallelize the Execution.

KUNDAN_KUMAR · April 29, 2009, 1:20pm

Hi All,

I have a code in which input is a array and out is also the same array.

__global__  kernel_foo( char  *chArray) 

{

	int count = 0;

	for(int i=1; i<312; i++) // this loop is replaced by thread index but here for better understanding I use for loop.

	{

		if(chArray[i]==chArray[i+1])

		 {

			 // body here, count is also updated here

			  switch( count )

				 {

					case 0:

						  chArray[x] =0;

							break;

					 case 1:

						  chArray[x+1]=312;

						   break;

					 default:

						  chArray[x+1]=0;

					 }

			}

			else

			{ 

				   chArray[x]=100;chArray[x+1]=150;

			 }

		 }

}

My question is :

In this code chArray used in decisions of if also updeted inside its body .When call this kernel I get wrong result.

Can this code be parallelize with threads?

ctierney42 · April 29, 2009, 4:30pm

Hi All,

I have a code in which input is a array and out is also the same array.
__global__  kernel_foo( char  *chArray) 

{

	int count = 0;

	for(int i=1; i<312; i++) // this loop is replaced by thread index but here for better understanding I use for loop.

	{

		if(chArray[i]==chArray[i+1])

		 {

			 // body here, count is also updated here

			  switch( count )

				 {

					case 0:

						  chArray[x] =0;

							break;

					 case 1:

						  chArray[x+1]=312;

						   break;

					 default:

						  chArray[x+1]=0;

					 }

			}

			else

			{ 

				   chArray[x]=100;chArray[x+1]=150;

			 }

		 }

}
My question is :

In this code chArray used in decisions of if also updeted inside its body .When call this kernel I get wrong result.

Can this code be parallelize with threads?

You aren’t getting the correct result because threads run in non-deterministic order and you are overwriting values in the array. Why not just put the output into a new array, say chArray2? You can ping-pong the use of the array so that you can reduce extra copying. Ex:

cudaMalloc((void **) &d_chArray, …);

cudaMalloc((void **) &d_chArray2,…);

cudaMemcpy(d_chArray, chArray,…);

kernel_foo<<cuda_grids,cuda_threads>> (chArray, chArray2);

cudaMemcpy(chArray, d_chArray2,…);

Jamie_K · April 29, 2009, 4:36pm

The way this is written, taken literally, no it cannot be parallelized. Each element depends on count, which depends on the previous values.

BUT

Your code is nonsense. Presumably your actual algorithm would not be retarded, and there might be a way to implement it in a parallel way.

Quoc_Vinh · April 30, 2009, 1:54am

Hi All,

I have a code in which input is a array and out is also the same array.
__global__  kernel_foo( char  *chArray) 

{

	int count = 0;

	for(int i=1; i<312; i++) // this loop is replaced by thread index but here for better understanding I use for loop.

	{

		if(chArray[i]==chArray[i+1])

		 {

			 // body here, count is also updated here

			  switch( count )

				 {

					case 0:

						  chArray[x] =0;

							break;

					 case 1:

						  chArray[x+1]=312;

						   break;

					 default:

						  chArray[x+1]=0;

					 }

			}

			else

			{ 

				   chArray[x]=100;chArray[x+1]=150;

			 }

		 }

}
My question is :

In this code chArray used in decisions of if also updeted inside its body .When call this kernel I get wrong result.

Can this code be parallelize with threads?

Please show me how to calculate Count variable?

what is difference “x” and “i”, are they same? I don’t see any declaration for “x” variable.

If it is possible, you should use 2 arrays, once for Input, and other once for output.

:)

KUNDAN_KUMAR · April 30, 2009, 5:18am

Actually count is updated inside switch cases. In my early given code ,inside swich case 0 and case 1 reinitialize the value of count to 0 and default case it is incremented by 1. And also in else part count is incremented by 1( i have not mentained earlier because i want to show different thing ), and x =i above if condition.

Quoc_Vinh · April 30, 2009, 1:02pm

Hi KUNDAN KUMAR
I really interesting with your function.
I read it many time, but I still can not understand clearly
Can you post some thing more easy to understand. like pseudoCode, as detail as possible.
:)

Jamie_K · April 30, 2009, 1:08pm

Why don’t you describe what you are trying to do? Is this a homework problem?