Execution Problem Emulation & < 100000 working

Sambi · December 1, 2008, 3:06am

Hello all,

I am working on 1000000 points of data sets and I am weird problem in execution, Code works just fine in emulation mode and when the points are equal to 100000, but it won’t work for 1000000.

__global__ void cuda_delaunay_func(int blocks, block_properties *block_p)

{

	int bid = blockIdx.x;

	long int i;

	int m, flag;

	float xn, yn, zn;

	long int F=0;

	long int index = 0, end_index = 0;

	long int roller=0;

	index = block_p[bid].index;

	end_index = block_p[bid].end_index;

	for(i=index; i< end_index - 2; i++)

	{

		roller++;

	}

	block_p[bid].no_of_loops = roller;

	block_p[bid].no_of_tri = F;

}

Structure for block_properties is

struct block_properties

{

	unsigned int siteidx, counter, deltay, deltax;

	long int no_of_tri, no_of_loops;

	long int t1, t2, t3, t4;

	long int l1, l2, l3, l4;

	long int index;

	long int end_index;

};

I don’t know how to find the error. And while executing the program it is not giving any error just skipping the above function and execute the other functions.

And, I am using Ubuntu 8.04 with Tesla C870, can I use the cuda debugger??

Thanks,

Sambi

tmurray · December 1, 2008, 3:56am

Debugger doesn’t work on C870 (and I’ve no idea if it works with Ubuntu 8.04 or not).

Are you checking error codes?

Sambi · December 1, 2008, 4:10am

It doesn’t give me any error while executing,

when I try to copy the block_properties to host and try to print all I get is zero’s.

tmurray · December 1, 2008, 4:27am

are you checking to see what cudaThreadSynchronize returns?

alex_dubinsky · December 1, 2008, 4:34am

Keep in mind SDK error-checking functions only work in Debug builds.

Sambi · December 1, 2008, 4:57am

Yes, I get unspecified launch failure for the following function which is before the function given above. And the same error occurs for the function given above.

__global__ void move_points(point2 *pData, uint elements,	limit *cuda_values, int blocks, delaunay_struct *cu_de,	block_properties *block_p)

{

	unsigned int index=0, end_index;

	int bid = blockIdx.x;

	int i;

	for (i=0; i< bid; i++)

	{

		index = index + block_p[i].counter + 3;

	}

	index ++;

	end_index = index + block_p[bid].counter;

	block_p[bid].siteidx = index;

	block_p[bid].index = index;

	block_p[bid].end_index = end_index;

	block_p[bid].t1 = block_p[bid].t2 = block_p[bid].t3 = block_p[bid].t4 = 0;

	block_p[bid].l1 = block_p[bid].l2 = block_p[bid].l3 = block_p[bid].l4 = 0;

	for (i=0; i< elements; i++)

	{

		if(bid == 0)

		{

			if(pData[i].y>=cuda_values[bid].bottom && pData[i].y <= cuda_values[bid+1].top)

			{

				cu_de[index].x = pData[i].x;

				cu_de[index].y = pData[i].y;

				cu_de[index].z = cu_de[index].x*cu_de[index].x + cu_de[index].y * cu_de[index].y;

				index++;

			}

		}

		else if(bid == (blocks-1))

		{

			if(pData[i].y>=cuda_values[bid-1].bottom && pData[i].y <= cuda_values[bid].top)

			{

				cu_de[index].x = pData[i].x;

				cu_de[index].y = pData[i].y;

				cu_de[index].z = cu_de[index].x*cu_de[index].x + cu_de[index].y * cu_de[index].y;

				index++;

			}

		}

		else

		{

			if(pData[i].y>=cuda_values[bid-1].bottom && pData[i].y <= cuda_values[bid+1].top)

			{

				cu_de[index].x = pData[i].x;

				cu_de[index].y = pData[i].y;

				cu_de[index].z = cu_de[index].x*cu_de[index].x + cu_de[index].y * cu_de[index].y;

				index++;

			}

		}

	}

}

Why isn’t there anything about cudaThreadSynchronize in Programming Guide 2.0?

tmurray · December 1, 2008, 5:55am

It is explained. Go read 4.5.15.

Also unspecified launch failure = you have a segfault. Run your kernel through valgrind in emulation.