intersecting line segments on GPU

Here’s a little CUDA “object” that allows one to test whether two pieces of line segments intersect in 2D euclidean space. I tried to keep it branchless. It is loosely based on a code snippet I found in a game programming forum.

Although I’ve been doing CUDA for more than 2 years I am still convinced that someone else may know some more optimal way of solving things. Would you do it any differently?

My walls are going to be defined in constant memory, whereas my ray coordinates will dynamically depend on the threadIdx/blockIdx. I will also express the ray as a segment and simply use the intersection test below.

/**

 * A CUDA "object" to represent a wall segment.

 * Implemented without a constructor for maximum toolkit compatibility.

 */

typedef struct _segment

{

	/** intersection test */

	__host__ __device__ bool intersects(const _segment &seg) const

	{

		float2 u = make_float2(b.x - a.x, b.y - a.y);			  // b - a

		float2 v = make_float2(seg.b.x-seg.a.x, seg.b.y-seg.a.y);  // seg.b-seg.a

		float2 w = make_float2(a.x-seg.a.x, a.y-seg.a.y);		  // a - seg.a

		// branchless version

		float D = u.x * v.y - u.y * v.x;

		float s = v.x * w.y - v.y * w.x;

		float t = u.x * w.y - u.y * w.x;

		return (fabs(D) > 0.000001f) &&   // parallel test

			   (s >= 0 && s <= D) &&	  // inside first segment?

			   (t >= 0 && t <= D);		// inside the other segment?

		//float2 intersectPt = a + u * (s/d);					  //or b + v * (t/d);

	}

public:

	float2 a;  /**< wall start point */

	float2 b;  /**< wall end point */

} segment;

What about the case when they are parallel, but overlapping (or at least a portion of them overlaps?)

And also, if you are planning on using this on pre-fermi hardware, I’d use the __fmul and __fadd instructions when calculating D, s and t, because I’ve experienced the fmad mucking up my intersection tests before.

What about the case when they are parallel, but overlapping (or at least a portion of them overlaps?)

And also, if you are planning on using this on pre-fermi hardware, I’d use the __fmul and __fadd instructions when calculating D, s and t, because I’ve experienced the fmad mucking up my intersection tests before.

Ok, I am not considering this case because it is very rare.

But I noticed that my code was not giving correct results for D < 0. How would I do this more elegantly? This is really ugly now.

return  ( (D > +0.000001f)   &&	  // parallel test

				  (s >= 0 && s <= D) &&	  // inside first segment?

				  (t >= 0 && t <= D)	)	// inside the other segment?

						  ||

				( (D < -0.000001f)   &&	  // parallel test

				  (s >= D && s <= 0) &&	  // inside first segment?

				  (t >= D && t <= 0)	);   // inside the other segment?

Ok, I am not considering this case because it is very rare.

But I noticed that my code was not giving correct results for D < 0. How would I do this more elegantly? This is really ugly now.

return  ( (D > +0.000001f)   &&	  // parallel test

				  (s >= 0 && s <= D) &&	  // inside first segment?

				  (t >= 0 && t <= D)	)	// inside the other segment?

						  ||

				( (D < -0.000001f)   &&	  // parallel test

				  (s >= D && s <= 0) &&	  // inside first segment?

				  (t >= D && t <= 0)	);   // inside the other segment?

How about:

float d = fabs(D)

(d > 0.000001f) &&   // parallel test

(s >= 0 && s <= d) &&	  // inside first segment

(t >= 0 && t <=d);

??

Edit: (taking negative s and t into account):

(fabs(D) > 0.000001f) &&   // parallel test

(fabs(D-s) >= 0 && fabs(D-s) <= fabs(D)) &&	  // inside first segment

(fabs(D-t) >= 0 && fabs(D-t) <= fabs(D))

Sergey.

How about:

float d = fabs(D)

(d > 0.000001f) &&   // parallel test

(s >= 0 && s <= d) &&	  // inside first segment

(t >= 0 && t <=d);

??

Edit: (taking negative s and t into account):

(fabs(D) > 0.000001f) &&   // parallel test

(fabs(D-s) >= 0 && fabs(D-s) <= fabs(D)) &&	  // inside first segment

(fabs(D-t) >= 0 && fabs(D-t) <= fabs(D))

Sergey.

doesn’t catch the case where s or t has a different sign than D. So we definitely need to include a sign test here.

Does CUDA offer a sign() function for floats?

doesn’t catch the case where s or t has a different sign than D. So we definitely need to include a sign test here.

Does CUDA offer a sign() function for floats?

Sorry, I’ve modified the post with that case included

Sorry, I’ve modified the post with that case included

And knowing that arguments of comparison are always positive, it could be reduced to

if ((as_int(fabs(D) - 0.0001f) | as_int(fabs(D)-fabs(D-s)) | as_int(fabs(D)-(fabs(D-t))) ^ 0x80000000)

{

}

Edit:

And after fixing screw-up with the sign test you get:

(((as_int(fabs(D) - 0.0001f) | as_int(fabs(D)-fabs(D-s)) | as_int(fabs(D)-(fabs(D-t))) & 0x80000000) == 0)

Hopefully I didn’t screw up somewhere else ;)

And knowing that arguments of comparison are always positive, it could be reduced to

if ((as_int(fabs(D) - 0.0001f) | as_int(fabs(D)-fabs(D-s)) | as_int(fabs(D)-(fabs(D-t))) ^ 0x80000000)

{

}

Edit:

And after fixing screw-up with the sign test you get:

(((as_int(fabs(D) - 0.0001f) | as_int(fabs(D)-fabs(D-s)) | as_int(fabs(D)-(fabs(D-t))) & 0x80000000) == 0)

Hopefully I didn’t screw up somewhere else ;)

You could at the cost of two divisions (use __fdividef), divide s and t by D and then you only need to check if s and t are between 0 and 1.

if (fabs(D) > .000001f) {

	s = __fdividef(s, D);

	t = __fdividef(t, D);

	return (s >= 0 && s <= 1) && (t >= 0 && t <=1);

}

else {

	//you can choose to add code to handle the parallel intersecting case or not

	//but I've found that eventually you'll hit it, so why not add the code

	//since this branch will almost never be taken, you're not really incurring

	//a penalty by its existince

}

Wanted to add that if you really don’t care about the else branch, then you don’t need the if either. If s or t becomes NAN due to division by 0, the comparisons will fail and it will return false.

You could at the cost of two divisions (use __fdividef), divide s and t by D and then you only need to check if s and t are between 0 and 1.

if (fabs(D) > .000001f) {

	s = __fdividef(s, D);

	t = __fdividef(t, D);

	return (s >= 0 && s <= 1) && (t >= 0 && t <=1);

}

else {

	//you can choose to add code to handle the parallel intersecting case or not

	//but I've found that eventually you'll hit it, so why not add the code

	//since this branch will almost never be taken, you're not really incurring

	//a penalty by its existince

}

Wanted to add that if you really don’t care about the else branch, then you don’t need the if either. If s or t becomes NAN due to division by 0, the comparisons will fail and it will return false.

There is signbit(float). I’m not sure if it behaves according to your (or my) plan with zero’s (which can be positive or negative).

There is signbit(float). I’m not sure if it behaves according to your (or my) plan with zero’s (which can be positive or negative).