I’m a beginner to parallel computing and openCL. I would like to figure out how I can write a kernel, or multiple kernels to add a variable amount of vectors of a variable size. I can’t find any examples which demonstrate the tricks necessary to accomplish this.
for(n=0; n < SmallNumber; ++n) {
for (n2=0; n2 < LargeNumber; ++n2) {
A[n2]+=B[n][n2];
}
}
I realize that you cannot pass a 2 dimensional vector to an openCL kernel, changed it to this.
int n,n2,n3,z,x=12,y=20000000;
int A[y];
int B[x][y];
int 1dB[x*y];
//initialize A...
//convert B to one dimension
for (n=0, z=0; n < x; ++n , z+=y) {
for (n2=z, n3=0; n2 < z+y; ++n2, ++n3) {
1dB[n2]=B[n][n3];
}
}
for (n=0, z=0; n < x; ++n, z+=y) {
for (n2=z, n3=0; n2 < z+y; ++n2, ++n3) {
A[n3]+=1dB[n2];
}
}
So now I don’t have the problem with 2 dimensional vectors, but I think there are a lot of other issue I’ll need to address.
Anyone have any suggestions or examples? It seams like it should be a fairly simple process, but I’ve become kind of confused trying to figure this out.
__kernel void openCL_Kernel( __global int *A,
__global int **B,
__global int *C)
{
int i=get_global_id(0);
int ii=get_global_id(1);
A[i]+=B[ii][i];
}
Other than the fact I cannot pass a 2 dimensional pointer, would this be equivalent assuming I define the work sizes appropriately?
edit: I just realized that I would break the openCL vector size limit if I tried to pass a single vector holding all the data.
Do you think this problem would be significantly easier using cuda?