Device function pointer argument

Hi,

I’m trying to do something along these lines:

[codebox]

device int foo(float3* Ret){

*Ret.x = 14;

return 0;

}

global void kernel(){

float3 point = make_float3(0.0, 0.0, 0.0);

float3* point_ptr = &point;

foo(point_ptr);

}

[/codebox]

But I end up with an error (“unspecified launch failure.”) when I do this. Specifically, I get this error when on the “foo” function I access the pointer. If I remove the lines that manipulate the pointer, the program runs fine.

Can I do this?

Thank you.

André.

By the way, I should mention I’m using CUDA on a macbook pro, with a 330M graphics card. CUDA driver version is 3.1.14.

Thank you,
André

By the way, I should mention I’m using CUDA on a macbook pro, with a 330M graphics card. CUDA driver version is 3.1.14.

Thank you,
André

I think you have an operator precedence problem. Within foo, you should either reference x by: “Ref->x” or by enclosing the dereference operator in parentesis: “(*Ref).x”, otherwise you are just dereferencing whatever Ref.x is storing, which triggers a segfault in video memory (also known as an unspecified launch failure).

Hope this helps.

Alejandro.-

I think you have an operator precedence problem. Within foo, you should either reference x by: “Ref->x” or by enclosing the dereference operator in parentesis: “(*Ref).x”, otherwise you are just dereferencing whatever Ref.x is storing, which triggers a segfault in video memory (also known as an unspecified launch failure).

Hope this helps.

Alejandro.-

I think you are right about the pointer thing. However, that should not be the problem.

I’ll paste some code here. I’m compiling this with -deviceemu, and it gives me " unspecified launch failure." when I run it. I don’t want to run it without -deviceemu because my pc just crashed too many times External Image

Before the code, I want to sum things up:

  • I have a class which deals with kernel launch. It’s a normal class, I only have there some methods which are __device__preceded.

  • I don’t use any “extern” things. My kernel is written directly in the ‘.cu’ of the class, which lets me write a member function in the class that calls the kernel.

  • The kernel receives a ‘this’ pointer. The pointer is then used to call device member functions. The function which I’m calling from the kernel is the one causing the problems!

  • In the ‘.cuh’ of the class I have also some inline operators defined to deal with float3 types. They are outside the class definition.

  • according to the deviceQuery I have a GPU with compute capability 1.2. I’m not defining any ‘arch=’ thing in the makefile.

//********************************************************* Generator.cu

int Generator::intersectRayTriangle(ray& R, triangle& T, float3* I){

	float3 u, v, n;

	float3 dir, w0;//, w;

	float r, a, b;

	// get triangle edge vectors and plane normal

	u = T.V1 - T.V0;

	v = T.V2 - T.V0;

	n = u * v;

	if(n == make_float3(0, 0, 0)){ // triangle is degenerate

		return -1;

	}

	dir = R.P1 - R.P0;

	w0 = R.P0 - T.V0;

	a = - dot(n, w0);

	b = dot(n, dir);

	if( fabs(b) < SMALL_NUM ){ // ray is parallel

		if(a == 0){

			return 2; // ray lies in triangle plane

		}else{

			return 0; // ray is disjoint from plane

		}

	}

	r = a / b;

	if(r < 0.0){

		return 0; // ray goes away from triangle

	}

	// for a segment, also test if (r > 1.0) => no intersect

	(*I) = R.P0 + (r * dir);

	return 1;

}

__global__ 

void kernel(Generator* drr_gen, ModelOBJ::Vertex* vertex_buffer, int n_triangles,

		float* result){

		drr_gen->doSomething();

	// block indexes

	int bx = blockIdx.x; int by = blockIdx.y;

	//thread indexes

	int tx = threadIdx.x; int ty = threadIdx.y;

	// matrix indexes

	int line = by * blockDim.y + ty;

	int column = bx * blockDim.x + tx;

	//flat array index

	int index = line * DRR_WIDTH + column;

	int stepX = blockDim.x * gridDim.x;

	int stepY = blockDim.y * gridDim.y;

	// data 

	DRR_Generator::triangle tri;

	DRR_Generator::ray R;

	R.P0 = make_float3(0.0, 0.0, 0.0);

	R.P1 = make_float3(0.0, 2.0, 0.0);

	float3 origin = make_float3(0.0, 0.0, 0.0);

	float3 Point = make_float3(0.0, 0.0, 0.0);

	float3* Point_ptr = &Point;

	// each block of threads iterates over a corresponding block of the DRR or more

	for(int i = line; i < DRR_HEIGHT; i += stepY){

		for(int j = column; j < DRR_WIDTH; j += stepX){

			// "one" thread per pixel

			index = i * DRR_WIDTH + j;

			

			// each thread goes over all triangles

			// and tries to find intersection with rays;

			for(int i = 0; i < n_triangles; i++){

				tri.V0 = make_float3((vertex_buffer[i]).position[0],

					(vertex_buffer[i]).position[1], (vertex_buffer[i]).position[2]);

				tri.V1 = make_float3((vertex_buffer[i+1]).position[0],

					(vertex_buffer[i+1]).position[1], (vertex_buffer[i+1]).position[2]);

				tri.V2 = make_float3((vertex_buffer[i+2]).position[0],

					(vertex_buffer[i+2]).position[1], (vertex_buffer[i+2]).position[2]);

				int temp = drr_gen->intersectRayTriangle(R, tri, Point_ptr);

				if(temp == 1){

					float distance = fabs(eucDistance(origin, Point));

					result[index] += distance;

				}

			}

		}

	}

	return;

}

//********************************************************* Generator.cuh

class Generator{

public:

...

 __host__ __device__

			int intersectRayTriangle(ray& T, triangle& T, float3* I);

...

};

// ********************** operators

inline

__host__ __device__

float dot(float3 a, float3 b){

	return (a.x * b.x + a.y * b.y + a.z * b.z);

}

inline

__host__ __device__

float3 operator +(float3 a, float3 b){

	return make_float3(a.x + b.x, a.y + b.y, a.z + b.z);

}

inline

__host__ __device__

float3 operator -(float3 a, float3 b){

	return make_float3(a.x - b.x, a.y - b.y, a.z - b.z);

}

inline

__host__ __device__

float3 operator*(float3 a, float3 b){

	return make_float3(a.x * b.x, a.y * b.y, a.z * b.z);

}

inline

__host__ __device__

bool operator==(float3 a, float3 b){

	return (a.x == b.x && a.y == b.y && a.z == b.z);

}

inline

__host__ __device__

bool operator!=(float3 a, float3 b){

	return (a.x != b.x || a.y != b.y || a.z != b.z);

}

inline

__host__ __device__

float3 operator*(float a, float3 b){

	return make_float3(a * b.x, a * b.y, a * b.z);

}

inline

__host__ __device__

float3 operator*(float3 b, float a){

	return make_float3(a * b.x, a * b.y, a * b.z);

}

inline

__host__ __device__

float eucDistance(float3 p1, float3 p2){

	float rx = p1.x - p2.x;

	float ry = p1.y - p2.y;

	float rz = p1.z - p2.z;

	return sqrtf(rx * rx + ry * ry + rz * rz);

}

Now, the thing that I think is an indicator of the problem:

  • the code above, when compiled gives me: “ptxas /tmp/tmpxft_00000250_00000000-2_Generator.ptx, line 186; warning : Double is not supported. Demoting to float”

But, if I comment all the return statements in the “intersectRayTriangle” function except the final one (comment the ones inside the ifs), the compiling phase throws no warnings!

I don’t know what is going on.

I would be much appreciated if someone had the patience to read this long post and help me.

Thanks again,

André.

I think you are right about the pointer thing. However, that should not be the problem.

I’ll paste some code here. I’m compiling this with -deviceemu, and it gives me " unspecified launch failure." when I run it. I don’t want to run it without -deviceemu because my pc just crashed too many times External Image

Before the code, I want to sum things up:

  • I have a class which deals with kernel launch. It’s a normal class, I only have there some methods which are __device__preceded.

  • I don’t use any “extern” things. My kernel is written directly in the ‘.cu’ of the class, which lets me write a member function in the class that calls the kernel.

  • The kernel receives a ‘this’ pointer. The pointer is then used to call device member functions. The function which I’m calling from the kernel is the one causing the problems!

  • In the ‘.cuh’ of the class I have also some inline operators defined to deal with float3 types. They are outside the class definition.

  • according to the deviceQuery I have a GPU with compute capability 1.2. I’m not defining any ‘arch=’ thing in the makefile.

//********************************************************* Generator.cu

int Generator::intersectRayTriangle(ray& R, triangle& T, float3* I){

	float3 u, v, n;

	float3 dir, w0;//, w;

	float r, a, b;

	// get triangle edge vectors and plane normal

	u = T.V1 - T.V0;

	v = T.V2 - T.V0;

	n = u * v;

	if(n == make_float3(0, 0, 0)){ // triangle is degenerate

		return -1;

	}

	dir = R.P1 - R.P0;

	w0 = R.P0 - T.V0;

	a = - dot(n, w0);

	b = dot(n, dir);

	if( fabs(b) < SMALL_NUM ){ // ray is parallel

		if(a == 0){

			return 2; // ray lies in triangle plane

		}else{

			return 0; // ray is disjoint from plane

		}

	}

	r = a / b;

	if(r < 0.0){

		return 0; // ray goes away from triangle

	}

	// for a segment, also test if (r > 1.0) => no intersect

	(*I) = R.P0 + (r * dir);

	return 1;

}

__global__ 

void kernel(Generator* drr_gen, ModelOBJ::Vertex* vertex_buffer, int n_triangles,

		float* result){

		drr_gen->doSomething();

	// block indexes

	int bx = blockIdx.x; int by = blockIdx.y;

	//thread indexes

	int tx = threadIdx.x; int ty = threadIdx.y;

	// matrix indexes

	int line = by * blockDim.y + ty;

	int column = bx * blockDim.x + tx;

	//flat array index

	int index = line * DRR_WIDTH + column;

	int stepX = blockDim.x * gridDim.x;

	int stepY = blockDim.y * gridDim.y;

	// data 

	DRR_Generator::triangle tri;

	DRR_Generator::ray R;

	R.P0 = make_float3(0.0, 0.0, 0.0);

	R.P1 = make_float3(0.0, 2.0, 0.0);

	float3 origin = make_float3(0.0, 0.0, 0.0);

	float3 Point = make_float3(0.0, 0.0, 0.0);

	float3* Point_ptr = &Point;

	// each block of threads iterates over a corresponding block of the DRR or more

	for(int i = line; i < DRR_HEIGHT; i += stepY){

		for(int j = column; j < DRR_WIDTH; j += stepX){

			// "one" thread per pixel

			index = i * DRR_WIDTH + j;

			

			// each thread goes over all triangles

			// and tries to find intersection with rays;

			for(int i = 0; i < n_triangles; i++){

				tri.V0 = make_float3((vertex_buffer[i]).position[0],

					(vertex_buffer[i]).position[1], (vertex_buffer[i]).position[2]);

				tri.V1 = make_float3((vertex_buffer[i+1]).position[0],

					(vertex_buffer[i+1]).position[1], (vertex_buffer[i+1]).position[2]);

				tri.V2 = make_float3((vertex_buffer[i+2]).position[0],

					(vertex_buffer[i+2]).position[1], (vertex_buffer[i+2]).position[2]);

				int temp = drr_gen->intersectRayTriangle(R, tri, Point_ptr);

				if(temp == 1){

					float distance = fabs(eucDistance(origin, Point));

					result[index] += distance;

				}

			}

		}

	}

	return;

}

//********************************************************* Generator.cuh

class Generator{

public:

...

 __host__ __device__

			int intersectRayTriangle(ray& T, triangle& T, float3* I);

...

};

// ********************** operators

inline

__host__ __device__

float dot(float3 a, float3 b){

	return (a.x * b.x + a.y * b.y + a.z * b.z);

}

inline

__host__ __device__

float3 operator +(float3 a, float3 b){

	return make_float3(a.x + b.x, a.y + b.y, a.z + b.z);

}

inline

__host__ __device__

float3 operator -(float3 a, float3 b){

	return make_float3(a.x - b.x, a.y - b.y, a.z - b.z);

}

inline

__host__ __device__

float3 operator*(float3 a, float3 b){

	return make_float3(a.x * b.x, a.y * b.y, a.z * b.z);

}

inline

__host__ __device__

bool operator==(float3 a, float3 b){

	return (a.x == b.x && a.y == b.y && a.z == b.z);

}

inline

__host__ __device__

bool operator!=(float3 a, float3 b){

	return (a.x != b.x || a.y != b.y || a.z != b.z);

}

inline

__host__ __device__

float3 operator*(float a, float3 b){

	return make_float3(a * b.x, a * b.y, a * b.z);

}

inline

__host__ __device__

float3 operator*(float3 b, float a){

	return make_float3(a * b.x, a * b.y, a * b.z);

}

inline

__host__ __device__

float eucDistance(float3 p1, float3 p2){

	float rx = p1.x - p2.x;

	float ry = p1.y - p2.y;

	float rz = p1.z - p2.z;

	return sqrtf(rx * rx + ry * ry + rz * rz);

}

Now, the thing that I think is an indicator of the problem:

  • the code above, when compiled gives me: “ptxas /tmp/tmpxft_00000250_00000000-2_Generator.ptx, line 186; warning : Double is not supported. Demoting to float”

But, if I comment all the return statements in the “intersectRayTriangle” function except the final one (comment the ones inside the ifs), the compiling phase throws no warnings!

I don’t know what is going on.

I would be much appreciated if someone had the patience to read this long post and help me.

Thanks again,

André.

The “doubles not supported” warning the compiler’s giving you is correct, and it is just a warning. You’re initializing float3s with doubles (“make_float3(0.0, 0.0, 0.0)”). This is harmless since the compiler is depreciating them for you.

Just use proper floats (“make_float3(0.0f, 0.0f, 0.0f)”) to prevent the warning.

The “doubles not supported” warning the compiler’s giving you is correct, and it is just a warning. You’re initializing float3s with doubles (“make_float3(0.0, 0.0, 0.0)”). This is harmless since the compiler is depreciating them for you.

Just use proper floats (“make_float3(0.0f, 0.0f, 0.0f)”) to prevent the warning.