Device function pointer argument

thyandrecardoso · September 6, 2010, 7:29pm

Hi,

I’m trying to do something along these lines:

[codebox]

device int foo(float3* Ret){

*Ret.x = 14;

…

return 0;

}

global void kernel(){

float3 point = make_float3(0.0, 0.0, 0.0);

float3* point_ptr = &point;

foo(point_ptr);

}

[/codebox]

But I end up with an error (“unspecified launch failure.”) when I do this. Specifically, I get this error when on the “foo” function I access the pointer. If I remove the lines that manipulate the pointer, the program runs fine.

Can I do this?

Thank you.

AndrÃ©.

thyandrecardoso · September 6, 2010, 8:25pm

By the way, I should mention I’m using CUDA on a macbook pro, with a 330M graphics card. CUDA driver version is 3.1.14.

Thank you,
AndrÃ©

thyandrecardoso · September 6, 2010, 8:25pm

By the way, I should mention I’m using CUDA on a macbook pro, with a 330M graphics card. CUDA driver version is 3.1.14.

Thank you,
AndrÃ©

asegovia · September 7, 2010, 12:45am

I think you have an operator precedence problem. Within foo, you should either reference x by: “Ref->x” or by enclosing the dereference operator in parentesis: “(*Ref).x”, otherwise you are just dereferencing whatever Ref.x is storing, which triggers a segfault in video memory (also known as an unspecified launch failure).

Hope this helps.

Alejandro.-

asegovia · September 7, 2010, 12:45am

I think you have an operator precedence problem. Within foo, you should either reference x by: “Ref->x” or by enclosing the dereference operator in parentesis: “(*Ref).x”, otherwise you are just dereferencing whatever Ref.x is storing, which triggers a segfault in video memory (also known as an unspecified launch failure).

Hope this helps.

Alejandro.-

thyandrecardoso · September 7, 2010, 2:46pm

I think you are right about the pointer thing. However, that should not be the problem.

I’ll paste some code here. I’m compiling this with -deviceemu, and it gives me " unspecified launch failure." when I run it. I don’t want to run it without -deviceemu because my pc just crashed too many times External Image

Before the code, I want to sum things up:

I have a class which deals with kernel launch. It’s a normal class, I only have there some methods which are __device__preceded.
I don’t use any “extern” things. My kernel is written directly in the ‘.cu’ of the class, which lets me write a member function in the class that calls the kernel.
The kernel receives a ‘this’ pointer. The pointer is then used to call device member functions. The function which I’m calling from the kernel is the one causing the problems!
In the ‘.cuh’ of the class I have also some inline operators defined to deal with float3 types. They are outside the class definition.
according to the deviceQuery I have a GPU with compute capability 1.2. I’m not defining any ‘arch=’ thing in the makefile.

//********************************************************* Generator.cu

int Generator::intersectRayTriangle(ray& R, triangle& T, float3* I){

	float3 u, v, n;

	float3 dir, w0;//, w;

	float r, a, b;

	// get triangle edge vectors and plane normal

	u = T.V1 - T.V0;

	v = T.V2 - T.V0;

	n = u * v;

	if(n == make_float3(0, 0, 0)){ // triangle is degenerate

		return -1;

	}

	dir = R.P1 - R.P0;

	w0 = R.P0 - T.V0;

	a = - dot(n, w0);

	b = dot(n, dir);

	if( fabs(b) < SMALL_NUM ){ // ray is parallel

		if(a == 0){

			return 2; // ray lies in triangle plane

		}else{

			return 0; // ray is disjoint from plane

		}

	}

	r = a / b;

	if(r < 0.0){

		return 0; // ray goes away from triangle

	}

	// for a segment, also test if (r > 1.0) => no intersect

	(*I) = R.P0 + (r * dir);

	return 1;

}

__global__ 

void kernel(Generator* drr_gen, ModelOBJ::Vertex* vertex_buffer, int n_triangles,

		float* result){

		drr_gen->doSomething();

	// block indexes

	int bx = blockIdx.x; int by = blockIdx.y;

	//thread indexes

	int tx = threadIdx.x; int ty = threadIdx.y;

	// matrix indexes

	int line = by * blockDim.y + ty;

	int column = bx * blockDim.x + tx;

	//flat array index

	int index = line * DRR_WIDTH + column;

	int stepX = blockDim.x * gridDim.x;

	int stepY = blockDim.y * gridDim.y;

	// data 

	DRR_Generator::triangle tri;

	DRR_Generator::ray R;

	R.P0 = make_float3(0.0, 0.0, 0.0);

	R.P1 = make_float3(0.0, 2.0, 0.0);

	float3 origin = make_float3(0.0, 0.0, 0.0);

	float3 Point = make_float3(0.0, 0.0, 0.0);

	float3* Point_ptr = &Point;

	// each block of threads iterates over a corresponding block of the DRR or more

	for(int i = line; i < DRR_HEIGHT; i += stepY){

		for(int j = column; j < DRR_WIDTH; j += stepX){

			// "one" thread per pixel

			index = i * DRR_WIDTH + j;

			

			// each thread goes over all triangles

			// and tries to find intersection with rays;

			for(int i = 0; i < n_triangles; i++){

				tri.V0 = make_float3((vertex_buffer[i]).position[0],

					(vertex_buffer[i]).position[1], (vertex_buffer[i]).position[2]);

				tri.V1 = make_float3((vertex_buffer[i+1]).position[0],

					(vertex_buffer[i+1]).position[1], (vertex_buffer[i+1]).position[2]);

				tri.V2 = make_float3((vertex_buffer[i+2]).position[0],

					(vertex_buffer[i+2]).position[1], (vertex_buffer[i+2]).position[2]);

				int temp = drr_gen->intersectRayTriangle(R, tri, Point_ptr);

				if(temp == 1){

					float distance = fabs(eucDistance(origin, Point));

					result[index] += distance;

				}

			}

		}

	}

	return;

}

//********************************************************* Generator.cuh

class Generator{

public:

...

 __host__ __device__

			int intersectRayTriangle(ray& T, triangle& T, float3* I);

...

};

// ********************** operators

inline

__host__ __device__

float dot(float3 a, float3 b){

	return (a.x * b.x + a.y * b.y + a.z * b.z);

}

inline

__host__ __device__

float3 operator +(float3 a, float3 b){

	return make_float3(a.x + b.x, a.y + b.y, a.z + b.z);

}

inline

__host__ __device__

float3 operator -(float3 a, float3 b){

	return make_float3(a.x - b.x, a.y - b.y, a.z - b.z);

}

inline

__host__ __device__

float3 operator*(float3 a, float3 b){

	return make_float3(a.x * b.x, a.y * b.y, a.z * b.z);

}

inline

__host__ __device__

bool operator==(float3 a, float3 b){

	return (a.x == b.x && a.y == b.y && a.z == b.z);

}

inline

__host__ __device__

bool operator!=(float3 a, float3 b){

	return (a.x != b.x || a.y != b.y || a.z != b.z);

}

inline

__host__ __device__

float3 operator*(float a, float3 b){

	return make_float3(a * b.x, a * b.y, a * b.z);

}

inline

__host__ __device__

float3 operator*(float3 b, float a){

	return make_float3(a * b.x, a * b.y, a * b.z);

}

inline

__host__ __device__

float eucDistance(float3 p1, float3 p2){

	float rx = p1.x - p2.x;

	float ry = p1.y - p2.y;

	float rz = p1.z - p2.z;

	return sqrtf(rx * rx + ry * ry + rz * rz);

}

Now, the thing that I think is an indicator of the problem:

the code above, when compiled gives me: “ptxas /tmp/tmpxft_00000250_00000000-2_Generator.ptx, line 186; warning : Double is not supported. Demoting to float”

But, if I comment all the return statements in the “intersectRayTriangle” function except the final one (comment the ones inside the ifs), the compiling phase throws no warnings!

I don’t know what is going on.

I would be much appreciated if someone had the patience to read this long post and help me.

Thanks again,

AndrÃ©.

thyandrecardoso · September 7, 2010, 2:46pm

I think you are right about the pointer thing. However, that should not be the problem.

I’ll paste some code here. I’m compiling this with -deviceemu, and it gives me " unspecified launch failure." when I run it. I don’t want to run it without -deviceemu because my pc just crashed too many times External Image

Before the code, I want to sum things up:

I have a class which deals with kernel launch. It’s a normal class, I only have there some methods which are __device__preceded.
I don’t use any “extern” things. My kernel is written directly in the ‘.cu’ of the class, which lets me write a member function in the class that calls the kernel.
The kernel receives a ‘this’ pointer. The pointer is then used to call device member functions. The function which I’m calling from the kernel is the one causing the problems!
In the ‘.cuh’ of the class I have also some inline operators defined to deal with float3 types. They are outside the class definition.
according to the deviceQuery I have a GPU with compute capability 1.2. I’m not defining any ‘arch=’ thing in the makefile.

//********************************************************* Generator.cu

int Generator::intersectRayTriangle(ray& R, triangle& T, float3* I){

	float3 u, v, n;

	float3 dir, w0;//, w;

	float r, a, b;

	// get triangle edge vectors and plane normal

	u = T.V1 - T.V0;

	v = T.V2 - T.V0;

	n = u * v;

	if(n == make_float3(0, 0, 0)){ // triangle is degenerate

		return -1;

	}

	dir = R.P1 - R.P0;

	w0 = R.P0 - T.V0;

	a = - dot(n, w0);

	b = dot(n, dir);

	if( fabs(b) < SMALL_NUM ){ // ray is parallel

		if(a == 0){

			return 2; // ray lies in triangle plane

		}else{

			return 0; // ray is disjoint from plane

		}

	}

	r = a / b;

	if(r < 0.0){

		return 0; // ray goes away from triangle

	}

	// for a segment, also test if (r > 1.0) => no intersect

	(*I) = R.P0 + (r * dir);

	return 1;

}

__global__ 

void kernel(Generator* drr_gen, ModelOBJ::Vertex* vertex_buffer, int n_triangles,

		float* result){

		drr_gen->doSomething();

	// block indexes

	int bx = blockIdx.x; int by = blockIdx.y;

	//thread indexes

	int tx = threadIdx.x; int ty = threadIdx.y;

	// matrix indexes

	int line = by * blockDim.y + ty;

	int column = bx * blockDim.x + tx;

	//flat array index

	int index = line * DRR_WIDTH + column;

	int stepX = blockDim.x * gridDim.x;

	int stepY = blockDim.y * gridDim.y;

	// data 

	DRR_Generator::triangle tri;

	DRR_Generator::ray R;

	R.P0 = make_float3(0.0, 0.0, 0.0);

	R.P1 = make_float3(0.0, 2.0, 0.0);

	float3 origin = make_float3(0.0, 0.0, 0.0);

	float3 Point = make_float3(0.0, 0.0, 0.0);

	float3* Point_ptr = &Point;

	// each block of threads iterates over a corresponding block of the DRR or more

	for(int i = line; i < DRR_HEIGHT; i += stepY){

		for(int j = column; j < DRR_WIDTH; j += stepX){

			// "one" thread per pixel

			index = i * DRR_WIDTH + j;

			

			// each thread goes over all triangles

			// and tries to find intersection with rays;

			for(int i = 0; i < n_triangles; i++){

				tri.V0 = make_float3((vertex_buffer[i]).position[0],

					(vertex_buffer[i]).position[1], (vertex_buffer[i]).position[2]);

				tri.V1 = make_float3((vertex_buffer[i+1]).position[0],

					(vertex_buffer[i+1]).position[1], (vertex_buffer[i+1]).position[2]);

				tri.V2 = make_float3((vertex_buffer[i+2]).position[0],

					(vertex_buffer[i+2]).position[1], (vertex_buffer[i+2]).position[2]);

				int temp = drr_gen->intersectRayTriangle(R, tri, Point_ptr);

				if(temp == 1){

					float distance = fabs(eucDistance(origin, Point));

					result[index] += distance;

				}

			}

		}

	}

	return;

}

//********************************************************* Generator.cuh

class Generator{

public:

...

 __host__ __device__

			int intersectRayTriangle(ray& T, triangle& T, float3* I);

...

};

// ********************** operators

inline

__host__ __device__

float dot(float3 a, float3 b){

	return (a.x * b.x + a.y * b.y + a.z * b.z);

}

inline

__host__ __device__

float3 operator +(float3 a, float3 b){

	return make_float3(a.x + b.x, a.y + b.y, a.z + b.z);

}

inline

__host__ __device__

float3 operator -(float3 a, float3 b){

	return make_float3(a.x - b.x, a.y - b.y, a.z - b.z);

}

inline

__host__ __device__

float3 operator*(float3 a, float3 b){

	return make_float3(a.x * b.x, a.y * b.y, a.z * b.z);

}

inline

__host__ __device__

bool operator==(float3 a, float3 b){

	return (a.x == b.x && a.y == b.y && a.z == b.z);

}

inline

__host__ __device__

bool operator!=(float3 a, float3 b){

	return (a.x != b.x || a.y != b.y || a.z != b.z);

}

inline

__host__ __device__

float3 operator*(float a, float3 b){

	return make_float3(a * b.x, a * b.y, a * b.z);

}

inline

__host__ __device__

float3 operator*(float3 b, float a){

	return make_float3(a * b.x, a * b.y, a * b.z);

}

inline

__host__ __device__

float eucDistance(float3 p1, float3 p2){

	float rx = p1.x - p2.x;

	float ry = p1.y - p2.y;

	float rz = p1.z - p2.z;

	return sqrtf(rx * rx + ry * ry + rz * rz);

}

Now, the thing that I think is an indicator of the problem:

the code above, when compiled gives me: “ptxas /tmp/tmpxft_00000250_00000000-2_Generator.ptx, line 186; warning : Double is not supported. Demoting to float”

But, if I comment all the return statements in the “intersectRayTriangle” function except the final one (comment the ones inside the ifs), the compiling phase throws no warnings!

I don’t know what is going on.

I would be much appreciated if someone had the patience to read this long post and help me.

Thanks again,

AndrÃ©.

SPWorley · September 7, 2010, 7:53pm

The “doubles not supported” warning the compiler’s giving you is correct, and it is just a warning. You’re initializing float3s with doubles (“make_float3(0.0, 0.0, 0.0)”). This is harmless since the compiler is depreciating them for you.

Just use proper floats (“make_float3(0.0f, 0.0f, 0.0f)”) to prevent the warning.

SPWorley · September 7, 2010, 7:53pm

The “doubles not supported” warning the compiler’s giving you is correct, and it is just a warning. You’re initializing float3s with doubles (“make_float3(0.0, 0.0, 0.0)”). This is harmless since the compiler is depreciating them for you.

Just use proper floats (“make_float3(0.0f, 0.0f, 0.0f)”) to prevent the warning.

Topic		Replies	Views
A naive question about dereferenced pointers unspecified launch failure CUDA Programming and Performance	10	2966	September 24, 2008
Unspecified launch failure strange error, please help CUDA Programming and Performance	13	16332	December 31, 2007
Strange memory gremlins Getting pwned by pointers CUDA Programming and Performance	9	12188	July 1, 2009
in emulation mode, the program works without emulation error why? ERROR: 0xC0000005: Access violatio CUDA Programming and Performance	6	8613	January 28, 2010
memory access CUDA Programming and Performance	3	1021	July 15, 2009
[Beginner] [Absoloute] Device Memory Questions Double Pointers CUDA Programming and Performance	11	10350	November 4, 2010
Deep subroutine use causes cudaError_enum? CUDA Programming and Performance	14	14418	February 6, 2010
Device Memeroy allocation and data transfer Data transfer between host and device CUDA Programming and Performance	5	2587	June 16, 2011
cudaMalloc() return "cudaErrorLaunchFailure" CUDA Programming and Performance	19	14557	December 11, 2008
Pointer to pointer strange behaviour cudaSafeCall Runtime Api error: Unknown error CUDA Programming and Performance	5	1266	May 8, 2010

Device function pointer argument

Related topics