Parallelize the Execution.

Hi All,

I have a code in which input is a array and out is also the same array.

__global__  kernel_foo( char  *chArray) 


	int count = 0;

	for(int i=1; i<312; i++) // this loop is replaced by thread index but here for better understanding I use for loop.




			 // body here, count is also updated here

			  switch( count )


					case 0:

						  chArray[x] =0;


					 case 1:













My question is :

In this code chArray used in decisions of if also updeted inside its body .When call this kernel I get wrong result.

Can this code be parallelize with threads?

You aren’t getting the correct result because threads run in non-deterministic order and you are overwriting values in the array. Why not just put the output into a new array, say chArray2? You can ping-pong the use of the array so that you can reduce extra copying. Ex:

cudaMalloc((void **) &d_chArray, …);

cudaMalloc((void **) &d_chArray2,…);

cudaMemcpy(d_chArray, chArray,…);

kernel_foo<<cuda_grids,cuda_threads>> (chArray, chArray2);

cudaMemcpy(chArray, d_chArray2,…);

The way this is written, taken literally, no it cannot be parallelized. Each element depends on count, which depends on the previous values.


Your code is nonsense. Presumably your actual algorithm would not be retarded, and there might be a way to implement it in a parallel way.

Please show me how to calculate Count variable?

what is difference “x” and “i”, are they same? I don’t see any declaration for “x” variable.

If it is possible, you should use 2 arrays, once for Input, and other once for output.


Actually count is updated inside switch cases. In my early given code ,inside swich case 0 and case 1 reinitialize the value of count to 0 and default case it is incremented by 1. And also in else part count is incremented by 1( i have not mentained earlier because i want to show different thing ), and x =i above if condition.

I really interesting with your function.
I read it many time, but I still can not understand clearly
Can you post some thing more easy to understand. like pseudoCode, as detail as possible.

Why don’t you describe what you are trying to do? Is this a homework problem?