intersecting line segments on GPU

cbuchner1 · September 17, 2010, 10:28am

Here’s a little CUDA “object” that allows one to test whether two pieces of line segments intersect in 2D euclidean space. I tried to keep it branchless. It is loosely based on a code snippet I found in a game programming forum.

Although I’ve been doing CUDA for more than 2 years I am still convinced that someone else may know some more optimal way of solving things. Would you do it any differently?

My walls are going to be defined in constant memory, whereas my ray coordinates will dynamically depend on the threadIdx/blockIdx. I will also express the ray as a segment and simply use the intersection test below.

/**

 * A CUDA "object" to represent a wall segment.

 * Implemented without a constructor for maximum toolkit compatibility.

 */

typedef struct _segment

{

	/** intersection test */

	__host__ __device__ bool intersects(const _segment &seg) const

	{

		float2 u = make_float2(b.x - a.x, b.y - a.y);			  // b - a

		float2 v = make_float2(seg.b.x-seg.a.x, seg.b.y-seg.a.y);  // seg.b-seg.a

		float2 w = make_float2(a.x-seg.a.x, a.y-seg.a.y);		  // a - seg.a

		// branchless version

		float D = u.x * v.y - u.y * v.x;

		float s = v.x * w.y - v.y * w.x;

		float t = u.x * w.y - u.y * w.x;

		return (fabs(D) > 0.000001f) &&   // parallel test

			   (s >= 0 && s <= D) &&	  // inside first segment?

			   (t >= 0 && t <= D);		// inside the other segment?

		//float2 intersectPt = a + u * (s/d);					  //or b + v * (t/d);

	}

public:

	float2 a;  /**< wall start point */

	float2 b;  /**< wall end point */

} segment;

eelsen · September 17, 2010, 6:17pm

Here’s a little CUDA “object” that allows one to test whether two pieces of line segments intersect in 2D euclidean space. I tried to keep it branchless. It is loosely based on a code snippet I found in a game programming forum.

Although I’ve been doing CUDA for more than 2 years I am still convinced that someone else may know some more optimal way of solving things. Would you do it any differently?

My walls are going to be defined in constant memory, whereas my ray coordinates will dynamically depend on the threadIdx/blockIdx. I will also express the ray as a segment and simply use the intersection test below.
/**

 * A CUDA "object" to represent a wall segment.

 * Implemented without a constructor for maximum toolkit compatibility.

 */

typedef struct _segment

{

	/** intersection test */

	__host__ __device__ bool intersects(const _segment &seg) const

	{

		float2 u = make_float2(b.x - a.x, b.y - a.y);			  // b - a

		float2 v = make_float2(seg.b.x-seg.a.x, seg.b.y-seg.a.y);  // seg.b-seg.a

		float2 w = make_float2(a.x-seg.a.x, a.y-seg.a.y);		  // a - seg.a

		// branchless version

		float D = u.x * v.y - u.y * v.x;

		float s = v.x * w.y - v.y * w.x;

		float t = u.x * w.y - u.y * w.x;

		return (fabs(D) > 0.000001f) &&   // parallel test

			   (s >= 0 && s <= D) &&	  // inside first segment?

			   (t >= 0 && t <= D);		// inside the other segment?

		//float2 intersectPt = a + u * (s/d);					  //or b + v * (t/d);

	}

public:

	float2 a;  /**< wall start point */

	float2 b;  /**< wall end point */

} segment;

What about the case when they are parallel, but overlapping (or at least a portion of them overlaps?)

And also, if you are planning on using this on pre-fermi hardware, I’d use the __fmul and __fadd instructions when calculating D, s and t, because I’ve experienced the fmad mucking up my intersection tests before.

eelsen · September 17, 2010, 6:17pm

Here’s a little CUDA “object” that allows one to test whether two pieces of line segments intersect in 2D euclidean space. I tried to keep it branchless. It is loosely based on a code snippet I found in a game programming forum.

Although I’ve been doing CUDA for more than 2 years I am still convinced that someone else may know some more optimal way of solving things. Would you do it any differently?

My walls are going to be defined in constant memory, whereas my ray coordinates will dynamically depend on the threadIdx/blockIdx. I will also express the ray as a segment and simply use the intersection test below.
/**

 * A CUDA "object" to represent a wall segment.

 * Implemented without a constructor for maximum toolkit compatibility.

 */

typedef struct _segment

{

	/** intersection test */

	__host__ __device__ bool intersects(const _segment &seg) const

	{

		float2 u = make_float2(b.x - a.x, b.y - a.y);			  // b - a

		float2 v = make_float2(seg.b.x-seg.a.x, seg.b.y-seg.a.y);  // seg.b-seg.a

		float2 w = make_float2(a.x-seg.a.x, a.y-seg.a.y);		  // a - seg.a

		// branchless version

		float D = u.x * v.y - u.y * v.x;

		float s = v.x * w.y - v.y * w.x;

		float t = u.x * w.y - u.y * w.x;

		return (fabs(D) > 0.000001f) &&   // parallel test

			   (s >= 0 && s <= D) &&	  // inside first segment?

			   (t >= 0 && t <= D);		// inside the other segment?

		//float2 intersectPt = a + u * (s/d);					  //or b + v * (t/d);

	}

public:

	float2 a;  /**< wall start point */

	float2 b;  /**< wall end point */

} segment;

What about the case when they are parallel, but overlapping (or at least a portion of them overlaps?)

And also, if you are planning on using this on pre-fermi hardware, I’d use the __fmul and __fadd instructions when calculating D, s and t, because I’ve experienced the fmad mucking up my intersection tests before.

cbuchner1 · September 20, 2010, 11:57am

Ok, I am not considering this case because it is very rare.

But I noticed that my code was not giving correct results for D < 0. How would I do this more elegantly? This is really ugly now.

return  ( (D > +0.000001f)   &&	  // parallel test

				  (s >= 0 && s <= D) &&	  // inside first segment?

				  (t >= 0 && t <= D)	)	// inside the other segment?

						  ||

				( (D < -0.000001f)   &&	  // parallel test

				  (s >= D && s <= 0) &&	  // inside first segment?

				  (t >= D && t <= 0)	);   // inside the other segment?

cbuchner1 · September 20, 2010, 11:57am

Ok, I am not considering this case because it is very rare.

But I noticed that my code was not giving correct results for D < 0. How would I do this more elegantly? This is really ugly now.

return  ( (D > +0.000001f)   &&	  // parallel test

				  (s >= 0 && s <= D) &&	  // inside first segment?

				  (t >= 0 && t <= D)	)	// inside the other segment?

						  ||

				( (D < -0.000001f)   &&	  // parallel test

				  (s >= D && s <= 0) &&	  // inside first segment?

				  (t >= D && t <= 0)	);   // inside the other segment?

sergeyn · September 20, 2010, 12:35pm

How about:

float d = fabs(D)

(d > 0.000001f) &&   // parallel test

(s >= 0 && s <= d) &&	  // inside first segment

(t >= 0 && t <=d);

??

Edit: (taking negative s and t into account):

(fabs(D) > 0.000001f) &&   // parallel test

(fabs(D-s) >= 0 && fabs(D-s) <= fabs(D)) &&	  // inside first segment

(fabs(D-t) >= 0 && fabs(D-t) <= fabs(D))

Sergey.

sergeyn · September 20, 2010, 12:35pm

How about:

float d = fabs(D)

(d > 0.000001f) &&   // parallel test

(s >= 0 && s <= d) &&	  // inside first segment

(t >= 0 && t <=d);

??

Edit: (taking negative s and t into account):

(fabs(D) > 0.000001f) &&   // parallel test

(fabs(D-s) >= 0 && fabs(D-s) <= fabs(D)) &&	  // inside first segment

(fabs(D-t) >= 0 && fabs(D-t) <= fabs(D))

Sergey.

cbuchner1 · September 20, 2010, 12:37pm

doesn’t catch the case where s or t has a different sign than D. So we definitely need to include a sign test here.

Does CUDA offer a sign() function for floats?

cbuchner1 · September 20, 2010, 12:37pm

doesn’t catch the case where s or t has a different sign than D. So we definitely need to include a sign test here.

Does CUDA offer a sign() function for floats?

sergeyn · September 20, 2010, 12:40pm

Sorry, I’ve modified the post with that case included

sergeyn · September 20, 2010, 12:40pm

Sorry, I’ve modified the post with that case included

sergeyn · September 20, 2010, 12:51pm

And knowing that arguments of comparison are always positive, it could be reduced to

if ((as_int(fabs(D) - 0.0001f) | as_int(fabs(D)-fabs(D-s)) | as_int(fabs(D)-(fabs(D-t))) ^ 0x80000000)

{

…

}

Edit:

And after fixing screw-up with the sign test you get:

(((as_int(fabs(D) - 0.0001f) | as_int(fabs(D)-fabs(D-s)) | as_int(fabs(D)-(fabs(D-t))) & 0x80000000) == 0)

Hopefully I didn’t screw up somewhere else ;)

sergeyn · September 20, 2010, 12:51pm

And knowing that arguments of comparison are always positive, it could be reduced to

if ((as_int(fabs(D) - 0.0001f) | as_int(fabs(D)-fabs(D-s)) | as_int(fabs(D)-(fabs(D-t))) ^ 0x80000000)

{

…

}

Edit:

And after fixing screw-up with the sign test you get:

(((as_int(fabs(D) - 0.0001f) | as_int(fabs(D)-fabs(D-s)) | as_int(fabs(D)-(fabs(D-t))) & 0x80000000) == 0)

Hopefully I didn’t screw up somewhere else ;)

eelsen · September 20, 2010, 4:53pm

Ok, I am not considering this case because it is very rare.

But I noticed that my code was not giving correct results for D < 0. How would I do this more elegantly? This is really ugly now.
return  ( (D > +0.000001f)   &&	  // parallel test

				  (s >= 0 && s <= D) &&	  // inside first segment?

				  (t >= 0 && t <= D)	)	// inside the other segment?

						  ||

				( (D < -0.000001f)   &&	  // parallel test

				  (s >= D && s <= 0) &&	  // inside first segment?

				  (t >= D && t <= 0)	);   // inside the other segment?

You could at the cost of two divisions (use __fdividef), divide s and t by D and then you only need to check if s and t are between 0 and 1.

if (fabs(D) > .000001f) {

	s = __fdividef(s, D);

	t = __fdividef(t, D);

	return (s >= 0 && s <= 1) && (t >= 0 && t <=1);

}

else {

	//you can choose to add code to handle the parallel intersecting case or not

	//but I've found that eventually you'll hit it, so why not add the code

	//since this branch will almost never be taken, you're not really incurring

	//a penalty by its existince

}

Wanted to add that if you really don’t care about the else branch, then you don’t need the if either. If s or t becomes NAN due to division by 0, the comparisons will fail and it will return false.

eelsen · September 20, 2010, 4:53pm

Ok, I am not considering this case because it is very rare.

But I noticed that my code was not giving correct results for D < 0. How would I do this more elegantly? This is really ugly now.
return  ( (D > +0.000001f)   &&	  // parallel test

				  (s >= 0 && s <= D) &&	  // inside first segment?

				  (t >= 0 && t <= D)	)	// inside the other segment?

						  ||

				( (D < -0.000001f)   &&	  // parallel test

				  (s >= D && s <= 0) &&	  // inside first segment?

				  (t >= D && t <= 0)	);   // inside the other segment?

You could at the cost of two divisions (use __fdividef), divide s and t by D and then you only need to check if s and t are between 0 and 1.

if (fabs(D) > .000001f) {

	s = __fdividef(s, D);

	t = __fdividef(t, D);

	return (s >= 0 && s <= 1) && (t >= 0 && t <=1);

}

else {

	//you can choose to add code to handle the parallel intersecting case or not

	//but I've found that eventually you'll hit it, so why not add the code

	//since this branch will almost never be taken, you're not really incurring

	//a penalty by its existince

}

Wanted to add that if you really don’t care about the else branch, then you don’t need the if either. If s or t becomes NAN due to division by 0, the comparisons will fail and it will return false.

jan.heckman · September 20, 2010, 4:57pm

There is signbit(float). I’m not sure if it behaves according to your (or my) plan with zero’s (which can be positive or negative).

jan.heckman · September 20, 2010, 4:57pm

There is signbit(float). I’m not sure if it behaves according to your (or my) plan with zero’s (which can be positive or negative).