Converting a for loop to cuda

Here is part of my code - Im operating on a 2d integer array (data-2d array)
idlist-1d array
tem-1d array
tem1-1d array

while
{
//SOME CODE HERE
da1=data[r][cols-1];
for(int k=1;k<rows;k++)
{
r=idlist[k];
da2=data[r][cols-1];
if(da1==da2)
tem[val++]=idlist[k];
else
tem1[val1++]=idlist[k];

	}

//SOME CODE HERE
}

I have read few examples of cuda program and they are understandable but when it comes my program it looks very complicated. How can we convert this kind for loop to parallel code, not exactly - give me some suggestions.Does it needs 1 or 2 kernels to be written?? For codes flowing this way is it possible to run parallelly using threads in cuda?. Please help

No need to reinvent the wheel by writing custom CUDA code when you can use libraries. ArrayFire is a great starting point for that. All you need to do is drop the for loop and instead do matrix operations, which is very easy in your code. Ping me by email if you need any further assistance.

ya i will try that way .thank u :)