 # optimization of a special double loop

Hi everybody,

First of all, thank you for all the information you provide in these forums, and sorry for my english too ;-)

My problem is simple. I need to scan two vector (sorted) and check if the sum of every index is equal to the value i’m looking for. Something like this:

``````int k = 0:tam1+tam2;

for (i=0; i<tam1; i++)

if (i < k) {

for (j=0, j<tam2; j++)

if ( i + j = k ) [store the result]

else if ( i + j > k) break;		  // vectors are sorted <=

else k++;
``````

The question is how to do this efficiently. The size of my vectors suppose that is 1024, so for example I could divide 2048/64, each one of the 64 threads launched to cover the whole vector scan their part th0: 0 to 32, th1: 33 to 64, etc so treadIdx.x would be the k variable. Each kernel do a double loop to find the values. And also, I want to do this with several points, so and apropiate grid could be (1, num_points) and the block (64). Something like I show next ( simplified ).

``````__global__ void scan( params ) {

int point = blockIdx.y;

int i1 = 0;

while ( !end ) {

if ( vector1[i1 + point*tam] < k ) {			  // the values por every point are split tam by tam - it's the jump

int i2 = abs(k - vector1[i1  + point*tam]);

int i3 = i2;

while ( !find ) {

if (vector2[i3  + point*tam] == i2)	{ [store result]; 	find = true; }

else if (vector2[i3  + point*tam] < i2 ) { break; i1++; }

i3--;

}

}

else { i1 = 0;  k = k + 1; }

end = (k == (id_th + 1)*size_part);

}
``````

I have done this but it still slow, and I think there must be something more efficient to do this. If you understand me and can help me, I would be very glad.

Thank you.

You should look in the SDK for reduction and scan samples.

eyal