Beginner Tutorial Beginner Tutorial- finding point in poly

If someone is considering writing a tutorial for CUDA then I have a topic I would love you to cover:

Finding a point inside a polygon

Example C code is here:…tes/pnpoly.html

I have 2d polygons with many vertices that I need to pass a Lat/Long to and get back whether the Lat/Long is inside the polygon or not. Would CUDA be the appropriate tool for this?

Ever hopefull…

That looks like an interesting problem to benchmark I might try to implement it once my 8800 GTX shows up. Some questions about your use case:

  1. How many vertices are in your polygon? This will determine in what area of GPU memory they can be stored.

  2. How many polygons are there? Do they change frequently during the execution of the program?

  3. Do you need to test if just one point is inside a polygon at a time? Would it be possible to test several at once?

  1. The polygons I have represent suburbs so they can go from simple shapes (10 or so vertices) to coastal areas where the shape of the suburb maps to the coastline (100,000 or more vertices).

  2. I have about 6000 polygons which thankfully only change annually.

  3. I could pass many points in if that were more efficient.

If your lat/long pairs are a regular, square grid, there’s a very easy way to do this using the old GPGPU ways: Opengl Tesselation. You just set the pixel coords up to get a 1:1 to your lat/long grid, set up the tesselator (lots of examples on the web), set up for render-to-texture, then render your polygons. Then OpenGL will handle all the details of figuring out what’s inside and outside your polygons - anything shaded is inside!

I have some code that I wrote for work that does this (sorry, can’t release it), but it only took me an hour or two to hack together.

If your lat/long points aren’t on a regular grid, you’ll have to do the work the hard way.

But what existing GPU hardware supports tesselation?


I’ve seen very nice speedups using the gluTesselator routines, so I’m guessing at least some of that process is accelerated on the GPU (probably just the triangle fill though.)

I will show my ignorance here, but it’s not clear to me why you’d need tesselation beyong what glBegin(GL_POLYGON) would do. Would the followig approach work?

  1. Assume that your test-point is in the center of a render target.
  2. Render all the polys tranformed so that the test-point is at (0,0). Color-code each poly (this will allows rendering (2^32 - 1) different polys, -1 for the “no color”).
  3. Read the pixel in the center of the render target and check its color.

I think you wouldn’t even need a large render target. A small render target with the corresponding ortho2D projection and viewport would possibly clip many polys, speeding up the test as a result. Even if your coordinates are not on a regular grid, you might be able to “zoom in” with the projection matrix when rendering the polys to obtain the desired accuracy. Seeing that the data is geographical, even coordinates obtained with GPS have a tolerance of a few feet, so you should be OK.

CUDA approach may be faster since you wouldn’t have to compute the “colors” for all the pixels you’re not really interested. If you want to run the inclusion test code you have, I’d suggest parallelizing over the edges (the loop in your code), as well as multiple polygons. The input data to a CUDA kernel would be edges (two endpoints and the id of the polygon to which the edge belongs). I’d make the test-point a constant since it’s the same for all CUDA threads. Each thread would process its share of edges for intersection. If an intersection is detected, the inside/outside flag for the particular polygon would be switched using the polygon id as an index into output array. Since you only deal with about 6000 polys, each threadblock could maintain a local copy of the entire output array (about 6KB). The only tricky part would be updating the global memory to combine the results of each threadblock, since there’s no synchronization among threadblocks to avoid race conditions. You may have to write as many copies as threadblocks used, then have the CPU copy these into its memory and combine them.

Question for crispy: do you need another tutorial, or do you really want to port your code to CUDA? I’d be curious to hear how long it takes you to process your data on a current CPU.


I don’t have time for a full reply here, but glBegin(GL_POLYGON) can’t handle concave polygons or polygons that have edges which cross one another.


I think there are a lot point in poly tests without tesselating and rasterizing everything.

I only remember: do all ray/edge intersection in one direction (up or right, in your case lat or long) and count them. If they are even than the point is outside, if odd than it’s inside the poly…

You may use bbox test to avoid unnessecary ray/edge intersection calculation as you dont need the exact intersection point, but only the intersection count. If your ray hits the bbox of the edge, there is an intersection in it and you can count it (you dont need the exact location). There is an exception to this: if the starting point is also in the bbox, you have to decide if there’s an intersection in ray direction.

In CUDA you may generate a thread for every edge in the poly and sum up intersection count in shared mem.

Should have watched the link in the thread start first. It’s the method I

remembered from CG1 course…

Also Eric Haines collected some point in poly tests:

You spotted my devious plan.

For me personally a tutorial would be great because I am out of my depth here, but if someone were to write the code then I would be able to follow along. I really have two questions that I was hoping for answers to in this thread:

Is this a problem that is suited to CUDA/GPU’s?

What would the performance increase be?

Currently using the code in the link at the start of the thread on my 6000 polys it takes anywhere between 1- to 60 seconds to do the following:

Load the the first poly from a db and put into array

run the point in poly code

load the next poly into the array


There are a number of open source projects that implement OpenGIS® Implementation Specification for Geographic information - Simple feature access ( and particularly this function:…ns.htm#Contains

So I think work done towards this could benefit many of us.

I think the test can be parallelized using CUDA.

Speedup is harder to predict. To process one edge you’ll need 5 loads for the 4 floats describing it, plus an integer index to the poly. The unit computation spends most operations on the expression in the if-statement, which has a few arithmetic operations. So, testing a single edge would be pretty memory intensive. If I had to make a really wild guess, I’d estimate a speedup of 20 or higher on a G80, compared to a single thread on a 2GHz Core Duo (matrix multiplication is memory intensive and a pretty straightforward implementation can get speedups of over 40, so I’m liberally allowing for load balancing and data writeback issues).

Does the execution time of your implementation vary between 1s and 60s depending on the test point? Also, do you fetch the polygon from the database across the network? Depending on the database, it may be more efficient to grab all the polygons once and then start testing.