PNPOLY approach faster in single thread CPU than CUDA Can someone tell me why this algorithm is fast

Jose_Angel_MT · May 20, 2011, 1:00pm

Hello to all,

My computer is a Quad Core Q6600 Intel processor and the graphics card Quadro 4000. Operating system Win7 64bits.
Visual Studio 2010 with CUDA 4.0.

I’m trying to port the PNPOLY algoritm that you can see here Point Inclusion in Polygon Test to CUDA.

The implementation is not jet finished, but what I have now is enough for this question:

I just created an structure with X,Y,Z coordinates of a closed polygon:

float coordinates =
{
721042.84 ,4504290.03 ,916.86,
721046.61 ,4504290.14 ,0.00,
721046.54 ,4504289.48 ,916.88,
…
};

and I modified the original algoritm like this:

int pnpoly2(int nvert, float vertex, float testx, float testy)
{
int i, j, c = 0;
for (i = 1, j = i-1; i < nvert; j = i++) {
if ( ((vertex[3i+1]>testy) != (vertex[3j+1]>testy)) &&
(testx < (vertex[3j]-vertex[3i]) * (testy-vertex[3i+1]) / (vertex[3j+1]-vertex[3i+1]) + vertex[3*i]) )
c = !c;
}
return c;
}

After that, I just created the next CUDA kernel:

global void pnpolyGPU(const float vertex, float testx, float testy, int results)
{
int id = blockIdx.x;
int indexOriginX = (blockIdx.x + 1) * 3;
int indexOriginY = (blockIdx.x + 1) * 3 + 1;
int indexDestinationX = blockIdx.x * 3;
int indexDestinationY = blockIdx.x * 3 + 1;

if ( ((vertex[indexOriginY]>testy) != (vertex[indexDestinationY]>testy)) && (testx < (vertex[indexDestinationX]-vertex[indexOriginX]) * (testy-vertex[indexOriginY]) / (vertex[indexDestinationY]-vertex[indexOriginY]) + vertex[indexOriginX]) )
	results[id] = 1;
else
	results[id] = 0;

}

After that, I calculate the maxmin of the polygon and then call to both implementations like this:

SYSTEMTIME st;
GetSystemTime(&st);
printf("The system time is: %02d:%02d:%02d\n", st.wHour, st.wMinute, st.wSecond);

for( float y=yMin; y<yMax; y+=2 ) {
	for( float x=xMin; x<xMax; x+=2 ) {
		pnpolyGPU<<<numberOfVertex - 1,1>>>(valuesGPU, x, y, resultsGPU);
	}
}

SYSTEMTIME st2;
GetSystemTime(&st2);

printf("The system time is: %02d:%02d:%02d\n", st2.wHour, st2.wMinute, st2.wSecond);

for( float y=yMin; y<yMax; y+=2 ) {
	for( float x=xMin; x<xMax; x+=2 ) {
		pnpoly2(numberOfVertex-1, coordinates, x, y);
	}
}

SYSTEMTIME st3;
GetSystemTime(&st3);

printf("The system time is: %02d:%02d:%02d\n", st3.wHour, st3.wMinute, st3.wSecond);

What is my surprise when I saw that the CPU implementation is faster than the CUDA implementation.
Why is the CPU implementation faster than the CUDA one?

Thank you in advance.
PnpolyCUDA.zip (30.7 KB)

Jose_Angel_MT · May 20, 2011, 3:04pm

Solved!

Using both blocks and threads the performance is boosting.

Topic		Replies	Views
Beginner Tutorial Beginner Tutorial- finding point in poly CUDA Programming and Performance	11	15336	March 7, 2007
Is this a task in which CUDA could speed up things? CUDA Programming and Performance	11	11927	September 29, 2009
Polygon intersections for Geo locations CUDA Programming and Performance	1	1302	January 8, 2018
CUDA slower than CPU? CUDA Programming and Performance	7	995	August 18, 2023
When i use CUDA, my program gets slower CUDA Programming and Performance	5	605	April 2, 2019
Cuda program taking more time. CUDA Programming and Performance	15	7191	November 21, 2010
300x to 600x times faster... really? CUDA Programming and Performance	92	35258	February 8, 2010
CUDA slower than CPU Help me please... CUDA Programming and Performance	0	2852	February 4, 2010
GPU is slower than CPU CUDA Programming and Performance	13	18528	November 4, 2025
CUDA much slower than Shader? (Solved: remove compiler -G flag) CUDA Programming and Performance	2	798	October 22, 2014

PNPOLY approach faster in single thread CPU than CUDA Can someone tell me why this algorithm is fast

Related topics