help-adding multiple vectors, undetermined number and size

I’m a beginner to parallel computing and openCL. I would like to figure out how I can write a kernel, or multiple kernels to add a variable amount of vectors of a variable size. I can’t find any examples which demonstrate the tricks necessary to accomplish this.

for(n=0; n < SmallNumber; ++n) {    

    for (n2=0; n2 < LargeNumber; ++n2) {

        A[n2]+=B[n][n2];

    }                                                               

}

I realize that you cannot pass a 2 dimensional vector to an openCL kernel, changed it to this.

int n,n2,n3,z,x=12,y=20000000;

int A[y];

int B[x][y];

int 1dB[x*y];

//initialize A...

//convert B to one dimension

for (n=0, z=0; n < x; ++n , z+=y) {         

    for (n2=z, n3=0; n2 < z+y; ++n2, ++n3) {

        1dB[n2]=B[n][n3];

    }

}

for (n=0, z=0; n < x; ++n, z+=y) {   

    for (n2=z, n3=0; n2 < z+y; ++n2, ++n3) {        

        A[n3]+=1dB[n2];

    }                   

}

So now I don’t have the problem with 2 dimensional vectors, but I think there are a lot of other issue I’ll need to address.

Anyone have any suggestions or examples? It seams like it should be a fairly simple process, but I’ve become kind of confused trying to figure this out.

__kernel void openCL_Kernel( __global  int *A,

                         __global  int **B,  

                         __global  int *C) 

{

int i=get_global_id(0);

int ii=get_global_id(1);

A[i]+=B[ii][i];

}

Other than the fact I cannot pass a 2 dimensional pointer, would this be equivalent assuming I define the work sizes appropriately?

edit: I just realized that I would break the openCL vector size limit if I tried to pass a single vector holding all the data.

Do you think this problem would be significantly easier using cuda?