Vector Plane Intersection

Hey guys,

my task is to calculate the intersection point between a vector to a plane , the calculation part is quite easy
if for example i have a 512 X 512 matrix , between each 3 near by points i declare a new plane (x,y) (x+1,y) (x,y+1)
by accessing the global memory for the relevant points, each cell in the matrix represents the Z value of the point.

my problem occurs whenever a vector goes through multiple planes , and there for i need to choose the closest plane to the camera (opengl)
however i dont know how to interact between ALL threads and not only those in same Block.
i thought of making a global variable that holds the nearest plane to the camera , but i need to make some kind of CRITICAL CODE SECTION
for the points that want to update the variable ,

i would like to get some ideas.
thanks for your help

anyone? it’s kinda urgent

thanks ,igal

If I understood it well You are trying to ray trace a height map of size 512x512 (as You mentioned). For every triangle of the grid You have a single thread that computes the intersection. Thus, single ray (vector) is being processed by some number of threads simultaneously. The problem is which one of them computed nearest-to-camera intersection.

My idea is to create something like z-buffer, associated with the rays. When a thread computes the intersection he reads the z-buffer for the given ray and checks whether the new distance from camera to intersection point is closer (smaller) then the one stored in the buffer. If so it replaces the value of the buffer or does nothing, otherwise. What the buffer store could be first of all intersection distance to camera and some kind of referance to which of the triangles/intersections the value referes to (address or sth). The replacement, of cource, need to be done atomiclly. There are some functions in CUDA (atomicMin would be most sutable) that would be usefull here. Read about them first :)

Write some more informations if I made it out wrong way or there is something You didn’t understand.


Hey Victor , you’ve got it.

well there is a known problem as you mentioned before “the ray shooting problem” , i searched for some information about it but it was too difficult to implement.

so i tried to implement something of my own.

first of all i declared a kernel which would save all intersection points (vector-plane intersections point ) in a global array(device pointer)

using a code i developed for the CPU and found it WORKING perfectly.

__global__ void findIntersc(float3 *inPoints,float4 *outPoints,float3 start,float3 lookingVec,float xFactor,float zFactor,int MAXX,int MAXY)


	unsigned idx = blockIdx.x*blockDim.x + threadIdx.x;

	int x = idx % MAXX;

	int z = idx / MAXX;

	float A,B,C,D,freeArg,concArg,t;

	if (idx + 1 % MAXX != 0 && idx < (MAXY - 2) * MAXX )


		float3 a;

		float3 b;

		float3 c;

		a.x = x				 * xFactor;  a.y = inPoints[idx].z;	         a.z = z*zFactor;

		b.x = (x+1)	         * xFactor;	 b.y = inPoints[idx+1].z;		 b.z = z*zFactor;

		c.x = x			     * xFactor;	 c.y = inPoints[idx+MAXX].z;	 c.z = (z+1)*zFactor;


		float3 caDF,baDF;

		float3 interPnt;


		caDF.x = c.x - a.x; caDF.y = c.y - a.y; caDF.z = c.z - a.z;

		baDF.x = b.x - a.x; baDF.y = b.y - a.y; baDF.z = b.z - a.z;

		A =  (caDF.y * baDF.z)  - (caDF.z * baDF.y);

		B = -(caDF.x * baDF.z)  - (caDF.z * baDF.x);

		C =  (caDF.x * baDF.y)  - (caDF.y * baDF.x);

		D =  -(A * a.x + B * a.y + C * a.z);

		freeArg = - (A *start.x + B * start.y + C * start.z) + (-D);

		concArg = A*lookingVec.x + B*lookingVec.y + C*lookingVec.z;

		t = freeArg / concArg;

		interPnt.x = start.x + t * lookingVec.x;

		interPnt.y = start.y + t * lookingVec.y;

		interPnt.z = start.z + t * lookingVec.z;

		if (interPnt.x >= a.x && interPnt.x <= b.x && interPnt.z >= a.z && interPnt.z <= c.z)


			outPoints[idx].x = x ;

			outPoints[idx].y = inPoints[idx].z ; 

			outPoints[idx].z = z ;

			outPoints[idx].w =  sqrt((interPnt.x - start.x)*(interPnt.x - start.x) + 

								     (interPnt.y - start.y)*(interPnt.y - start.y) +

								     (interPnt.z - start.z)*(interPnt.z - start.z));





the next thing i tried to do is to implement the reduce algorithm to find which is the nearest intersection point to my camera :

outPoint = pointer to the result

__global__ void findMin(float4 *inPoints,float4 *outPoint,int n)


	extern __shared__ float4 sdata[];

	// load shared mem

    unsigned int tid = threadIdx.x;

    unsigned int i = blockIdx.x*blockDim.x + threadIdx.x;

	float4 infinitePoint;

	/*infinitePoint.x = 9999;	infinitePoint.y = 9999;	infinitePoint.z = 9999;*/infinitePoint.w = 9999;


	//load everything to the shared memory

    sdata[tid] = (i < n) ? inPoints[i] : infinitePoint;



	for(unsigned int s=blockDim.x/2; s>0; s>>=1) 


		if (tid < s) 


			if (sdata[tid].w < sdata[tid + s].w)

				sdata[tid] = sdata[tid + s];




	if (tid == 0) *outPoint = sdata[0];


the problem is i never managed to get to the second code… after the first code finishes , i get garbage information in my outPoints.

i tried to use cudamemset (0 ) to set its values to 0 , and search for the specific variable which had been changed , but as i said i got garbage.

any ideas?