my task is to calculate the intersection point between a vector to a plane , the calculation part is quite easy
if for example i have a 512 X 512 matrix , between each 3 near by points i declare a new plane (x,y) (x+1,y) (x,y+1)
by accessing the global memory for the relevant points, each cell in the matrix represents the Z value of the point.

my problem occurs whenever a vector goes through multiple planes , and there for i need to choose the closest plane to the camera (opengl)
however i dont know how to interact between ALL threads and not only those in same Block.
i thought of making a global variable that holds the nearest plane to the camera , but i need to make some kind of CRITICAL CODE SECTION
for the points that want to update the variable ,

i would like to get some ideas.
thanks for your help

Hi,
If I understood it well You are trying to ray trace a height map of size 512x512 (as You mentioned). For every triangle of the grid You have a single thread that computes the intersection. Thus, single ray (vector) is being processed by some number of threads simultaneously. The problem is which one of them computed nearest-to-camera intersection.

My idea is to create something like z-buffer, associated with the rays. When a thread computes the intersection he reads the z-buffer for the given ray and checks whether the new distance from camera to intersection point is closer (smaller) then the one stored in the buffer. If so it replaces the value of the buffer or does nothing, otherwise. What the buffer store could be first of all intersection distance to camera and some kind of referance to which of the triangles/intersections the value referes to (address or sth). The replacement, of cource, need to be done atomiclly. There are some functions in CUDA (atomicMin would be most sutable) that would be usefull here. Read about them first :)

Write some more informations if I made it out wrong way or there is something You didn’t understand.

well there is a known problem as you mentioned before “the ray shooting problem” , i searched for some information about it but it was too difficult to implement.

so i tried to implement something of my own.

first of all i declared a kernel which would save all intersection points (vector-plane intersections point ) in a global array(device pointer)

using a code i developed for the CPU and found it WORKING perfectly.