parallelization on an array of numbers with CUDA

I’m having trouble doing the parallelization on an array of numbers with CUDA.

So, for example if we have an array M containing numbers ( 1 , 2 , 3 , 4 , 5)

And If I were to remove the number 2 in the array and shift everything to the left,

the resulting array would be ( 1 , 3 , 4 , 5 , 5 )

where M[1] = M[2], M[2] = M[3] , M[3] = M[4]

And my question is how can we do this in parallel in cuda? Because when we parallel this there might be a race condition where the number 2 (M[1]) might not be the first one to act first, if M[2] were the first one to shift, the resulting array would become ( 1 , 4 , 4 , 5 , 5). Is there any method to handle this? I’m fairly new to cuda so I’m not sure what to do…

My current code is as follows:

`global void gpu_shiftSeam(int *MCEnergyMat, int *seam, int width, int height, int currRow)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;

int index = i+width*j;

if(i < width && j <height)
{
   //shift values of -1 to the side of the image
   if(MCEnergyMat[i+width*j] == -1) //check if its equal to -1, if so remove and start shifting
   {
        if(i+1 != width) //check boundary
            MCEnergyMat[index] = MCEnergyMat[index+1];
   }
   if(seam[j] < i) //check if its past the "-1" shifting point, if so start shifting, else dont do anything
   {
       if(i+1 != width) //check boundary
           MCEnergyMat[index] = MCEnergyMat[index+1];
   }

}
}`

Where seam[i] contains the index I would like to remove in the array. and MCEnergyMat is just a 1D array converted from a 2d array… However, my code does not work…and I believe race condition is the problem.

Thanks!

Use different arrays for input and output.

If you have to do this in place, use a single warp for each line (looping over i), so you have full control of the order of operations within each line. This might provide insufficient parallelism to fully load large GPUs, but this operation is memory bandwidth bound anyway so it doesn’t matter that much.

Hmm, could you explain a little bit more? I’m not familiar with warp and how do I use it for loop over i ?

Thanks!