Hi everybody,
First of all, thank you for all the information you provide in these forums, and sorry for my english too ;-)
My problem is simple. I need to scan two vector (sorted) and check if the sum of every index is equal to the value i’m looking for. Something like this:
int k = 0:tam1+tam2;
for (i=0; i<tam1; i++)
if (i < k) {
for (j=0, j<tam2; j++)
if ( i + j = k ) [store the result]
else if ( i + j > k) break; // vectors are sorted <=
else k++;
The question is how to do this efficiently. The size of my vectors suppose that is 1024, so for example I could divide 2048/64, each one of the 64 threads launched to cover the whole vector scan their part th0: 0 to 32, th1: 33 to 64, etc so treadIdx.x would be the k variable. Each kernel do a double loop to find the values. And also, I want to do this with several points, so and apropiate grid could be (1, num_points) and the block (64). Something like I show next ( simplified ).
__global__ void scan( params ) {
int k = threadIdx.x*size_part;
int point = blockIdx.y;
int id_th = threadIdx.x;
int i1 = 0;
while ( !end ) {
if ( vector1[i1 + point*tam] < k ) { // the values por every point are split tam by tam - it's the jump
int i2 = abs(k - vector1[i1 + point*tam]);
int i3 = i2;
while ( !find ) {
if (vector2[i3 + point*tam] == i2) { [store result]; find = true; }
else if (vector2[i3 + point*tam] < i2 ) { break; i1++; }
i3--;
}
}
else { i1 = 0; k = k + 1; }
end = (k == (id_th + 1)*size_part);
}
I have done this but it still slow, and I think there must be something more efficient to do this. If you understand me and can help me, I would be very glad.
Thank you.