Unspecified launch failure strange error, please help

hello,

i am very new to CUDA, so my apologies if my problem is obvious. i’ve been just getting familiar with cuda, and life was good until i started getting this “'unspecified launch error”. i’ve narrowed the problem down to 2 sections of code…if i uncomment either section it errors out and (sometimes) requires a hard reboot.

i’ve marked the problem sections with a ‘// <-------------’

// performs ray-sphere intersection

__device__ float getSphereIntersection(float3 *rayPos, float3 *rayDir, float3 *sPos, float sRadius) {

	float retVal = NO_INTERSECTION;

	float a, b, c, root, omega1, omega2;

	normalizeRay(rayPos, rayDir);

	float x = rayPos->x;

	float y = rayPos->y;

	float z = rayPos->z;

	float sphereX = sPos->x;

	float sphereY = sPos->y;

	float sphereZ = sPos->z;

	float dx = rayDir->x;

	float dy = rayDir->y;

	float dz = rayDir->z;

	a = (dx * dx) + (dy * dy) + (dz * dz);

	b = 2 * (dx * (x  - sphereX) + dy * (y - sphereY) + dz * (z - sphereZ));

	c = (x - sphereX) * (x - sphereX) + (y - sphereY) * (y - sphereY) + (z - sphereZ) * (z - sphereZ) -

  sRadius * sRadius;

	root = b*b - 4*c;

	// no intersection

	if (root < 0) {

  retVal = NO_INTERSECTION;

	}

	if (root == 0) {

  //retVal = -b + sqrtf(root)/2;

  //return -b;                                    //   <-------------

	} 

	

	if (root > 0) {

  root = sqrtf(root);

 omega1 = (-b + root)/(2);

  omega2 = (-b - root)/(2);

  

  // .01 to allow for error

  /*if ((omega1 < 0.01) && (omega2 < 0.01) ){  //   <-------------

  	retVal = NO_INTERSECTION;

  }else if( (omega2 < .01 || omega1 <= omega2) && (omega1 > 0.01)){

  	retVal = omega1;

  }else if( (omega1 < .01 || omega2 <= omega1) && (omega2 > 0.01)){

  	retVal = omega2;

  }

  */

	}

	return retVal;

}

let me know if more code is required. any help would be greatly appreciated!

I suspect the complete code would be required to solve the problem. Truthfully, you may have to bang your head on this sort of thing for a number of hours. When you see “Unspecified launch failure”, you should often translate it to “Segmentation fault.” Just as on a more familiar CPU, a bad memory reference will sometimes cause a segmentation fault, and sometimes just quietly trash some memory. So eliminating the launch failure does not guarantee that the code works correctly. Remember also that commenting out lines of code sometimes has a more global effect than you might expect, so cause and effect can be tricky to narrow down.

I assume the code works in emulation mode? (If not, of course you should fix it there with the help of the debugger.) The most frustrating problems are those that occur only on the actual device. This can happen for lots of reasons. Besides synchronization issues, the most common one I see people report is pointer memory space issues. This is confusion by the coder (or the compiler!) about host versus global versus constant versus shared memory spaces. If you are using CUDA 1.1, the compiler will at least warn you if it is too confused (in CUDA 1.0, it used to sometimes do the wrong thing silently).

thanks for the help. unfortunately it works in emulation so i’m pretty stuck. ive been banging my head on this wall for a while now :-(

it’s just weird that the lines that “appear” to be causing the problems are simple returns and assignments to local variables…

More context is needed here, since the device function gets inlined by the compiler. Can you post more complete source code, or at least the global function that invokes the device function? When you change return statements of inlined functions, that can have very broad implications for the generated code. When you comment out a return statement, all of the code that the return statement depends on becomes subject to dead code elimination…

I don’t suppose you’re dividing by 0 in normalizeRay()?

I’ve been looking at the examples that come with the sdk, and they seem to avoid functions that return floats. i realize it’s a rather sad attempt to fix it, but hey, we (i) get desparate after things just don’t work. so, i have since changed the function to void and it now takes another argument, which it modifies to the desired value. unfortunately, it still gives the same error.

…sorry for this long code section.

#define BLOCKDIM_X 200

#define BLOCKDIM_Y 2

__device__ inline void getSphereIntersection(float &dist, float3 *rayPos, float3 *rayDir, float3 *sPos, float sRadius) {

	float retVal = 0.0f;

	float a, b, c, root, omega1, omega2;

	//a = (dx * dx) + (dy * dy) + (dz * dz);

	b = 2 * (rayDir->x * (rayPos->x  - sPos->x) + rayDir->y * (rayPos->y - sPos->y) + rayDir->z * (rayPos->z - sPos->z));

	c = (rayPos->x - sPos->x) * (rayPos->x - sPos->x) + (rayPos->y - sPos->y) * (rayPos->y - sPos->y) + 

  (rayPos->z - sPos->z) * (rayPos->z - sPos->z) -	sRadius * sRadius;

	root = b*b - 4*c;

	// no intersection

	if (root < 0) {

  retVal = NO_INTERSECTION;

	}

	if (root == 0) {

  //retVal = -b + sqrtf(root)/2;

  //retVal = -b;                                    //   <-------------

	} 

	

	if (root > 0) {

  root = sqrtf(root);

 omega1 = (-b + root)/(2);

  omega2 = (-b - root)/(2);

  

  // .01 to allow for error

  /*

  if ((omega1 < 0.01) && (omega2 < 0.01) ){  //   <-------------

  	retVal = NO_INTERSECTION;

  } else if( (omega2 < .01 || omega1 <= omega2) && (omega1 > 0.01)){

  	retVal = omega1;

  } else if( (omega1 < .01 || omega2 <= omega1) && (omega2 > 0.01)){

  	retVal = omega2;

  }

  */

	}

	dist = retVal;

}

// GPU threads that calculate pixels

__global__ void raytracer(uchar4 *dest, int width, int height, float3 *camPos, float3 *camLookAt, float3 *camUp,

        int color) {

   const int ix = blockDim.x * blockIdx.x + threadIdx.x;

    const int iy = blockDim.y * blockIdx.y + threadIdx.y;

	int pixel = iy*width + ix;

	// static test sphere

	float r=20;

	float3 s;

	s.x=0;

	s.y=0;

	s.z=0;

	// get this thread's ray

	// "center" ray:

	float directionX = (camLookAt->x - camPos->x) / 10.0f;

	float directionY = (camLookAt->y - camPos->y) / 10.0f;

	float directionZ = (camLookAt->z - camPos->z) / 10.0f;

	float3 rayDir;

	rayDir.x = directionX + (ix - width/2)*.01f;

	rayDir.y = directionY + (iy - height / 2)*.01f;

	rayDir.z = 1;

	

	// normalize

	normalizeRay(camPos, &rayDir);

	

	float intPoint;

	getSphereIntersection(intPoint, camPos, &rayDir, &s, r);

	if (intPoint == 0.0f) {

  dest[pixel].x = 0;

  dest[pixel].y = 0;

  dest[pixel].z = 0;

	} else {

  dest[pixel].x = color;

  dest[pixel].y = 0;

  dest[pixel].z = 0;

	}

}

// this spawns the GPU threads

void runRaytracer(uchar4 *dest, int width, int height, float3 *camPos, float3 *camLookAt, float3 *camUp) {

	dim3 threads(BLOCKDIM_X, BLOCKDIM_Y);

	dim3 grid(width/BLOCKDIM_X, height/BLOCKDIM_Y);

	static int color=150;

//	color=(color+1)%255;

	raytracer<<< grid, threads >>>(dest, width, height, camPos, camLookAt, camUp, color);

	CUT_CHECK_ERROR("CUDA Error\n");

}

i just checked using emu, and no, it doesn’t seem to be.

Are dest, camPos, camLookAt, and camUp pointers to host memory or device memory?

they are pointers to host memory. i’m guessing that is causing a problem…? any advice on how to get around that problem? should i be passing by values instead of reference?

That would be a pretty easy solution.

anybody know the max. amount of memory that can be sent to a global function? i was thinking this would exceed the limit (hence the pointers), but i can’t seem to find that number at the moment…

forgot to mention i asked this because changng to values didn’t help.

scratch that, i’m an idiot. it worked…

thanks everyone for the help. much appreciated!

I have had the exact same trouble, where I have been looking for 3 days where I had a bug and it turned out I was trying to dereference a pointer to host memory in my kernel function…

I must admit that I have had to buy an extra stock of brown paper bags since I started with Cuda…