I have an 3d array of 24000 elements and their position in space. I need calculate the distance from every element to all the rest. The elements at the beginning of the array will have the bulk of the calculations and the elements at the end of the array will have the least amount of calculations. The sequential code looks like this:
For x: 1 to 24000
For y:x+1 to 24000
Calculate distance between x and y

What would be the best way to split the elements among Blocks/Threads?
dim3 grid(???);
dim3 threads(???);

You can think of this as simply traversing through the elements of the upper triangular portion of a matrix. So what might be easiest is just to set your block size as (THREAD_CNT,1,1) and your grid as (24000,(24000+1)/2/THREAD_CNT+1,1). Then in your kernel, you could do something like this:

You’ll probably have to check my math there a bit, but I think that should work. Those crazy x and y formulas are the integer sequences needed to traverse an upper triangular matrix while skipping the diagonal (hence, the +1 on the y term).

It’s also possible to do something like the following, with just integer calculation. I always get paranoid that floating point roundoff will give wrong results when the numbers get big. Floats can’t distinguish between 2^25 and 2^25+1, which is about 33.5 million (nevermind the accuracy of sqrtf). But n will get as large as approx 288 million.

[codebox]global void triangleKernel(float *buf, int N) {

int n=(gridDim.x*blockIdx.y+blockIdx.x)*THREAD_CNT+threadIdx.x;
if (n >= N*(N-1)/2) {
return;
}
int j = n % N;
int i = n / N;
if (j <= i) {
j = N-1-j;
i = N-2-i;
}
buf[i*N+j] = 1.0f;

}[/codebox]

Index i will range from 0 to N-1 inclusive, and j will range from i+1 to N-1 inclusive. For 1-based indexing (1 to N inclusive), add 1 to both i and j.