I will show my ignorance here, but it’s not clear to me why you’d need tesselation beyong what glBegin(GL_POLYGON) would do. Would the followig approach work?
- Assume that your test-point is in the center of a render target.
- Render all the polys tranformed so that the test-point is at (0,0). Color-code each poly (this will allows rendering (2^32 - 1) different polys, -1 for the “no color”).
- Read the pixel in the center of the render target and check its color.
I think you wouldn’t even need a large render target. A small render target with the corresponding ortho2D projection and viewport would possibly clip many polys, speeding up the test as a result. Even if your coordinates are not on a regular grid, you might be able to “zoom in” with the projection matrix when rendering the polys to obtain the desired accuracy. Seeing that the data is geographical, even coordinates obtained with GPS have a tolerance of a few feet, so you should be OK.
CUDA approach may be faster since you wouldn’t have to compute the “colors” for all the pixels you’re not really interested. If you want to run the inclusion test code you have, I’d suggest parallelizing over the edges (the loop in your code), as well as multiple polygons. The input data to a CUDA kernel would be edges (two endpoints and the id of the polygon to which the edge belongs). I’d make the test-point a constant since it’s the same for all CUDA threads. Each thread would process its share of edges for intersection. If an intersection is detected, the inside/outside flag for the particular polygon would be switched using the polygon id as an index into output array. Since you only deal with about 6000 polys, each threadblock could maintain a local copy of the entire output array (about 6KB). The only tricky part would be updating the global memory to combine the results of each threadblock, since there’s no synchronization among threadblocks to avoid race conditions. You may have to write as many copies as threadblocks used, then have the CPU copy these into its memory and combine them.
Question for crispy: do you need another tutorial, or do you really want to port your code to CUDA? I’d be curious to hear how long it takes you to process your data on a current CPU.