I think you are right about the pointer thing. However, that should not be the problem.
I’ll paste some code here. I’m compiling this with -deviceemu, and it gives me " unspecified launch failure." when I run it. I don’t want to run it without -deviceemu because my pc just crashed too many times 
Before the code, I want to sum things up:
-
I have a class which deals with kernel launch. It’s a normal class, I only have there some methods which are __device__preceded.
-
I don’t use any “extern” things. My kernel is written directly in the ‘.cu’ of the class, which lets me write a member function in the class that calls the kernel.
-
The kernel receives a ‘this’ pointer. The pointer is then used to call device member functions. The function which I’m calling from the kernel is the one causing the problems!
-
In the ‘.cuh’ of the class I have also some inline operators defined to deal with float3 types. They are outside the class definition.
-
according to the deviceQuery I have a GPU with compute capability 1.2. I’m not defining any ‘arch=’ thing in the makefile.
//********************************************************* Generator.cu
int Generator::intersectRayTriangle(ray& R, triangle& T, float3* I){
float3 u, v, n;
float3 dir, w0;//, w;
float r, a, b;
// get triangle edge vectors and plane normal
u = T.V1 - T.V0;
v = T.V2 - T.V0;
n = u * v;
if(n == make_float3(0, 0, 0)){ // triangle is degenerate
return -1;
}
dir = R.P1 - R.P0;
w0 = R.P0 - T.V0;
a = - dot(n, w0);
b = dot(n, dir);
if( fabs(b) < SMALL_NUM ){ // ray is parallel
if(a == 0){
return 2; // ray lies in triangle plane
}else{
return 0; // ray is disjoint from plane
}
}
r = a / b;
if(r < 0.0){
return 0; // ray goes away from triangle
}
// for a segment, also test if (r > 1.0) => no intersect
(*I) = R.P0 + (r * dir);
return 1;
}
__global__
void kernel(Generator* drr_gen, ModelOBJ::Vertex* vertex_buffer, int n_triangles,
float* result){
drr_gen->doSomething();
// block indexes
int bx = blockIdx.x; int by = blockIdx.y;
//thread indexes
int tx = threadIdx.x; int ty = threadIdx.y;
// matrix indexes
int line = by * blockDim.y + ty;
int column = bx * blockDim.x + tx;
//flat array index
int index = line * DRR_WIDTH + column;
int stepX = blockDim.x * gridDim.x;
int stepY = blockDim.y * gridDim.y;
// data
DRR_Generator::triangle tri;
DRR_Generator::ray R;
R.P0 = make_float3(0.0, 0.0, 0.0);
R.P1 = make_float3(0.0, 2.0, 0.0);
float3 origin = make_float3(0.0, 0.0, 0.0);
float3 Point = make_float3(0.0, 0.0, 0.0);
float3* Point_ptr = &Point;
// each block of threads iterates over a corresponding block of the DRR or more
for(int i = line; i < DRR_HEIGHT; i += stepY){
for(int j = column; j < DRR_WIDTH; j += stepX){
// "one" thread per pixel
index = i * DRR_WIDTH + j;
// each thread goes over all triangles
// and tries to find intersection with rays;
for(int i = 0; i < n_triangles; i++){
tri.V0 = make_float3((vertex_buffer[i]).position[0],
(vertex_buffer[i]).position[1], (vertex_buffer[i]).position[2]);
tri.V1 = make_float3((vertex_buffer[i+1]).position[0],
(vertex_buffer[i+1]).position[1], (vertex_buffer[i+1]).position[2]);
tri.V2 = make_float3((vertex_buffer[i+2]).position[0],
(vertex_buffer[i+2]).position[1], (vertex_buffer[i+2]).position[2]);
int temp = drr_gen->intersectRayTriangle(R, tri, Point_ptr);
if(temp == 1){
float distance = fabs(eucDistance(origin, Point));
result[index] += distance;
}
}
}
}
return;
}
//********************************************************* Generator.cuh
class Generator{
public:
...
__host__ __device__
int intersectRayTriangle(ray& T, triangle& T, float3* I);
...
};
// ********************** operators
inline
__host__ __device__
float dot(float3 a, float3 b){
return (a.x * b.x + a.y * b.y + a.z * b.z);
}
inline
__host__ __device__
float3 operator +(float3 a, float3 b){
return make_float3(a.x + b.x, a.y + b.y, a.z + b.z);
}
inline
__host__ __device__
float3 operator -(float3 a, float3 b){
return make_float3(a.x - b.x, a.y - b.y, a.z - b.z);
}
inline
__host__ __device__
float3 operator*(float3 a, float3 b){
return make_float3(a.x * b.x, a.y * b.y, a.z * b.z);
}
inline
__host__ __device__
bool operator==(float3 a, float3 b){
return (a.x == b.x && a.y == b.y && a.z == b.z);
}
inline
__host__ __device__
bool operator!=(float3 a, float3 b){
return (a.x != b.x || a.y != b.y || a.z != b.z);
}
inline
__host__ __device__
float3 operator*(float a, float3 b){
return make_float3(a * b.x, a * b.y, a * b.z);
}
inline
__host__ __device__
float3 operator*(float3 b, float a){
return make_float3(a * b.x, a * b.y, a * b.z);
}
inline
__host__ __device__
float eucDistance(float3 p1, float3 p2){
float rx = p1.x - p2.x;
float ry = p1.y - p2.y;
float rz = p1.z - p2.z;
return sqrtf(rx * rx + ry * ry + rz * rz);
}
Now, the thing that I think is an indicator of the problem:
- the code above, when compiled gives me: “ptxas /tmp/tmpxft_00000250_00000000-2_Generator.ptx, line 186; warning : Double is not supported. Demoting to float”
But, if I comment all the return statements in the “intersectRayTriangle” function except the final one (comment the ones inside the ifs), the compiling phase throws no warnings!
I don’t know what is going on.
I would be much appreciated if someone had the patience to read this long post and help me.
Thanks again,
André.