I have this algorithm where it requires me to index 2-dimensional arrays in an out-of-order fashion, basically doing computation on specific elements not necessary in order, so for example the order of computation might be
result = A[1][3] + B[3][1]
result2 = A[2][4] + B[4][0]
…etc…
Is this possible on CUDA? So far most of the examples I have seen requires the entire array to go at the same time, in order, and with corresponding elements (A[0][1] + B[0][1], etc)
Currently I have this kernel code, its mostly still C++ code, but i am trying to do it in CUDA
//set index for L_h
int index_h=0;
double num;
double den;
for(i=0;i<SIZEY-1;i++) {
for(j=SIZEX-2;j>-1;j--) {
num = exp(L_r_d[i][j]+L_v_d[i][j]+L_r_d[i])+1;
den = exp(L_r_d[i][j]+L_v_d[i][j])+exp(L_r_d[i]);
L_h_d[i][index_h] = log10(num/den);
index_h++;
}
index_h = 0;
}
You can see that I cannot simply use
int i = threadIdx.x;
int j = threadIdx.y;
C[j*N + i] = A[j*N + i] + B[j*N + i];
to index the arrays because i need to do it out-of-order.
thanks, but exactly how does one create threads that index specific elements in the array out-of-order? most of the reference online that i can find typically say
ElementIndex = Array Width * Y co-ordinate + X co-ordinate
int i = threadIdx.x;
int j = threadIdx.y;
array[j*SIZEX + i] = array[j*SIZEX + i] + 5;
but i need to selectively index out of order, maybe i simply cannot do it with 1 line but rather with double loops as it would be with regular C++…?
can anyone help me get started, i dont even know how to index the last element using this CUDA-speak…
You can calculate the element index absolutely any way you please
ElementIndex = any C++ statement involving any variables
You probably cannot find the example you are looking for because every element indexing is going to be very different from problem to problem. The standard examples you are finding are tuned for coalesced writes so they all appear very similar.
Here is a watered down example of some code I have that writes to a list in an out of order way, hopefully it will help you see at least one way this can be done. Again, the specific pattern used is application specific so you have to come up with the right pattern for your application.
// simple example of out of order access pattern
int idx = threadIdx.x + blockIdx.x * blockDim.x;
int cur_height = 0;
// int *list is a pointer to a pitch by height array
for (int i = 0; i < something; i++)
{
int val = some calculations......
if (val meets some condition)
{
list[cur_height*pitch + idx] = val;
cur_height++
}
}
So what I’ve got here is 1 thread per element idx. And based on some complicated calculations (which are not important here) a value may or may not be added to the column idx in a matrix.
Hope this example helps you see how things can be done.
I played around with various ways of indexing and I am able to hack up a crude kernel to do my algorithm, not ideal (i basically manually index each computation rather than using loops) but since my particular application the arrays are not big, it should suffice for now.
hey i graduated from umich in 04, are you in some windowless room in EECS? heh =P
I bet you Mr. Anderson doesn’t know what you are talking about. First of all, he’s a physicist. Secondly, he only recently moved to Umich; he was in Ames Lab. in Iowa previously.
You’ve got it, I wouldn’t even know where EECS is :) And my office space has lots of windows overlooking the quad :) Physicists working in theory always end up on the top floor because the experimentalists 1) can’t easily get their huge instruments up the stairs and 2) the vibrations from the building increase as you go up and they can seriously mess up the experiments.