Hi all,

I’m trying to develop a simple CUDA application to compute a density map of profiler soudings, ie: know how many points are within a given radius of a given point.

So idea is to load points UTM coordinates (done and working), transfer them to GPU memory (done and working too), launch one thread per point and let it compute its distance to all the other points. Here is the kernel code:

float *pts: flat table with points coordinates (x0 y0 z0 x1 y1 z1 … xn yn zn)
float * res: res[idx] contains the number of neighbours of the point with index “idx”. This table should be of type int but I was trying with floats to see if it helps.
[i]
*gridDim.y+blockIdx.y)+threadIdx.x

**global**void lgzeDist(float

*pts, float*(blockIdx.x

*res, ulong nb_line) { int idx = (blockDim.x*blockDim.y)*blockDim.y+threadIdx.y;*

float dist;

float j;

while(idx < nb_line){

j=0;

for (int i=0; i<nb_line; i++){

dist=sqrtf(powf(pts[3idx]-pts[3

float dist;

float j;

while(idx < nb_line){

j=0;

for (int i=0; i<nb_line; i++){

dist=sqrtf(powf(pts[3

*i],2)+powf(pts[3*idx+1]-pts[3

*i+1],2)+powf(pts[3*idx+2]-pts[3

*i+2],2));*

if(dist>0)

j++;

} //for i

res[idx]=j;

idx+=blockDim.xblockDim.y;

if(dist>0)

j++;

} //for i

res[idx]=j;

idx+=blockDim.x

}

}

[/i]

Each thread is reading (and reading only) the float *pts table and writing in its dedicated cell of the float *res table so there should not be any shared data conflict.

nb_line is around 160000 with my test file, dist>0 means j should reach the value of nb_line-1 (ie number of neighbours with a distance to the current point strictly positive). As there are too many points I launch “only” 256 threads and each of them is applying the algorithm to several points (the while loop).

I have several problems with this code:

- If “i” gets too big, say 100000, sometimes pts[i] gives 0 whereas pts[100000] gives the right value
- If I remove the “if (dist>0)” line, j ends up with the nb_line value (that’s OK) but as soon as I start to add an “if” condition j is stuck to 0 even if the condition is something like if(dist>=0) which should be always true.
- If instead of res[idx]=j i put res{[dx]=dist (to store the last computed distance) I get always zero, but if I replace “dist” with a cut’n paste of the “sqrtf(pow(…)” line I get the right distance from the current point to the last one in the pts table.

I must be missing something but I don’t know what and I’m becoming a bit mad about that. Do you have any idea ?

I’m using the last available version of the SDK for linux (2.0 beta2), kernel 2.6.24, nvidia drivers 177.13.

Thanks in advance.

–

Irvin