integer comparison

HI:

I wrote a simple program for integer comparison between two arrays. But, i am not able to understand why it is not working ( :argh: )

I have two array A[112], B[112] integers.

I want to compare all the elements in A to all elements in B (i.e. we have 112*112 comparisons). I want to filter and report only the index of A and B whose elements are same.

In my final output i am getting all index’s. Please help me. Here is my code:

#include “string.h”
#include “stdio.h”
#include “cutil.h”
#include “time.h”

global void compare_arrays_gpu( int *in1, int *in2, int compout, int seq1out, int seq2out, int seq11, int seq22)
{
int idx=blockIdx.x
blockDim.x+threadIdx.x;
int idy=blockIdx.y
blockDim.y+threadIdx.y;
int index=idx+idy
seq22;
if(idx < seq11 && idy < seq22){
if(in1[idx]==in2[idy]){
compout[index]=1;
seq1out[index]=in1[idx];
seq2out[index]=in2[idy];
}
}

}
int main()
{
CUT_DEVICE_INIT();
/parameters/

int i;
int seq1_len=112;
int seq2_len=112;
time_t timer1;
timer1=time(NULL);
/* Allocate arrays a, b and c on host*/
int *seq1parts;
int *seq2parts;
int *compare;
int *seq1;
int *seq2;

seq1parts = (int*) malloc((seq1_len)sizeof(int));
seq2parts = (int
) malloc((seq2_len)sizeof(int));
compare = (int
) malloc((seq2_len)(seq1_len)sizeof(int));
seq1 = (int
) malloc((seq2_len)
(seq1_len)sizeof(int));
seq2 = (int
) malloc((seq2_len)*(seq1_len)*sizeof(int));
int k;
for(k=0; k<seq1_len; k++){
seq1parts[k]=k;
}

for(k=0; k<seq2_len; k++){
seq2parts[k]=k;
}

/* pointers to device memory /
int seq1parts_d;
int seq2parts_d;
int compare_d;
int seq1_d;
int seq2_d;
/
Allocate arrays a_d, b_d and c_d on device
/
cudaMalloc ((void **) &seq1parts_d, sizeof(int)
(seq1_len));
cudaMalloc ((void **) &seq2parts_d, sizeof(int)
(seq2_len));
cudaMalloc ((void **) &compare_d, sizeof(int)
(seq2_len)
(seq1_len));
cudaMalloc ((void **) &seq1_d, sizeof(int)(seq2_len)(seq1_len));
cudaMalloc ((void **) &seq2_d, sizeof(int)(seq2_len)(seq1_len));

/* Copy data from host memory to device memory /
cudaMemcpy(seq1parts_d, seq1parts, sizeof(int)
(seq1_len), cudaMemcpyHostToDevice);
cudaMemcpy(seq2parts_d, seq2parts, sizeof(int)*(seq2_len), cudaMemcpyHostToDevice);

/* Compute the execution configuration /
//int nblocks=(((seq1_len)
(seq2_len))+255)/256;
int blocksize=16;
dim3 dimBlock(blocksize, blocksize);
dim3 dimGrid((seq1_len)/dimBlock.x, (seq2_len)/dimBlock.y);

/* Add arrays a and b, store result in c */
compare_arrays_gpu<<<dimGrid,dimBlock>>>(seq1parts_d, seq2parts_d, compare_d,seq1_d,seq2_d, (seq1_len),(seq2_len));

CUT_CHECK_ERROR(“Kernel function filed”);
/* Copy data from deveice memory to host memory /
cudaMemcpy(compare, compare_d, sizeof(int)
(seq1_len)(seq2_len), cudaMemcpyDeviceToHost);
cudaMemcpy(seq1, seq1_d, sizeof(int)
(seq1_len)(seq2_len), cudaMemcpyDeviceToHost);
cudaMemcpy(seq2, seq2_d, sizeof(int)
(seq1_len)*(seq2_len), cudaMemcpyDeviceToHost);
time_t timer2;
timer2=time(NULL);
printf("%f\n", difftime(timer2, timer1));

/* Print c /
for (i=0; i<(seq1_len)
(seq2_len); i++)
printf("%d %d %d \n",seq1[i], seq2[i], compare[i]);

/* Free the memory */
free(seq1parts); free(seq2parts);free(compare);free(seq1);free(seq2);
CUDA_SAFE_CALL(cudaFree(seq1parts_d)); CUDA_SAFE_CALL(cudaFree(seq2parts_d));CUDA_SAFE_CALL(cudaFree(compare_d));CUDA_SAFE_CALL(cudaFree(seq1_d)); CUDA_SAFE_CALL(cudaFree(seq2_d));
}

for(k=0; k<seq1_len; k++){
seq1parts[k]=k;
}

for(k=0; k<seq2_len; k++){
seq2parts[k]=k;
}

You initialize them by the same way??

Yes correct. It means i have same numbers in seq1 and seq2 array.

I think that’s what Ced’ was trying to tell you:

If you have the same numbers in both arrays, what other output than all indices do you expect?

Thank you so much for all your replies.

I am having some thing different problem.

For example:

we have two array’s A and B. A={1,2,3} B={1,2,3}

Then number of comparision we can have are 6. i.e. compare(A[1], B[1]); compare(A[1],B[2]); compare(A[1], B[3]); and so on.

if A and B has same elements then it will return 1 otherwise 0.

if (compare(A, B)==1) then

indexA=A’s index;

indexB=B’sindex;

}

I want to return the indexA and indexB whose compare(A, B) value is 1.

I am sorry for the confusion. I hope it is clear now.

okay, if I understand you correctly, the only thing you would need is a matrix

which would look like this in your example:

aindex 0 1 2 3 4 5

bindex

0      1 0 0 0 0 0

1      0 1 0 0 0 0

2      0 0 1 0 0 0 

3      0 0 0 1 0 0

4      0 0 0 0 1 0

5      0 0 0 0 0 1

and it looks like this is exactly what you are doing

and too if you only need the indices, theres no need to store a and b values, you might as well read them from the input arrays after the comparison

only thing to mention: you should set the values to 0 if a != b[y] or initialize the buffers before, else you might copy back unintialized values

Thank you so much for your kind time Vrahok. I greatly appreciate all your kind help.

Please kindly check my device code. Please tell me whether i am doing anything wrong. It is printing all indecies.

DEVICE code:

global void compare_arrays_gpu( int *A, int *B, int *Aindex_d, int *Bindex_d, int A_length, int B_length)

{

int idx=blockIdx.x*blockDim.x+threadIdx.x;

int idy=blockIdx.y*blockDim.y+threadIdx.y;

int index=idx+idy*seq22;

if(idx < A_length&& idy < B_length){

if(A[idx]==B[idy]){

Aindex[index]=A[idx];

Bindex[index]=B[idy];

}

}

}

When i try to print the Aindex and Bindex. It is printing out all indecies in host.

Host code:

int main(){

int Aindex_host;

int Bindex_host;

Aindex_host= (int*) malloc((A_len)*(B_len)*sizeof(int));

Bindex_host = (int*) malloc((A_len)*(B_len)*sizeof(int));

nt *Aindex_d;

int *Bindex_d;

/* Allocate arrays a_d, b_d and c_d on device*/

cudaMalloc ((void **) &Aindex_d, sizeof(int)(A_len)(B_len));

cudaMalloc ((void **) &Aindex_d, sizeof(int)(B_len)(A_len));

compare_arrays_gpu<<<dimGrid,dimBlock>>>(A, B, Aindex_d,Bindex_d, A_len,B_len);

cudaMemcpy(Aindex_host, Aindex_d, sizeof(int)(A_len)(B_len), cudaMemcpyDeviceToHost);

cudaMemcpy(Bindex_host, Bindex_d, sizeof(int)(seq1_len)(seq2_len), cudaMemcpyDeviceToHost);

/* Print c */

for (i=0; i<(A_len)*(B_len); i++)

printf("%d %d \n",Aindex_host[i], Bindex_host[i]);

}

I would suggest some device code like this:

__global__ void compare_arrays_gpu(int *A, int *B, int *compare, int A_length, int B_length)

{

 Â int idx=blockIdx.x*blockDim.x+threadIdx.x;

 Â int idy=blockIdx.y*blockDim.y+threadIdx.y;

 Â int index=idx+idy*A_length;

 Â int result = 0; Â 

 Â if((idx < A_length) && (idy < B_length))

 Â {

  Â Â if(A[idx] == B[idy])

  Â Â {

  Â  Â Â result = 1;

  Â Â }

  Â Â compare[index] = result;

 Â }

}

this would give you the matrix mentioned in my earlier post

you could remove the “if((idx < A_length) && (idy < B_length))”, if you could fit the data to compare exactly in your grid or you run “padding comparisons” to fit your problem to the grid, this might give you some extra performance

e.g.: A = {1,2,3,4} B = {4,1,3}

compare:

0 0 0 1

1 0 0 0

0 0 1 0

I still don’t know if this is exactly what you need, but that’s a suggestion how I would solve your problem (as I understood it)

Thank you so much for your kind help and time.

I greatly appreciate your kind help.

I understood now completely.