Unspecified launch failure strange error, please help

cronus98 · December 29, 2007, 8:00pm

hello,

i am very new to CUDA, so my apologies if my problem is obvious. i’ve been just getting familiar with cuda, and life was good until i started getting this “'unspecified launch error”. i’ve narrowed the problem down to 2 sections of code…if i uncomment either section it errors out and (sometimes) requires a hard reboot.

i’ve marked the problem sections with a ‘// <-------------’

// performs ray-sphere intersection

__device__ float getSphereIntersection(float3 *rayPos, float3 *rayDir, float3 *sPos, float sRadius) {

	float retVal = NO_INTERSECTION;

	float a, b, c, root, omega1, omega2;

	normalizeRay(rayPos, rayDir);

	float x = rayPos->x;

	float y = rayPos->y;

	float z = rayPos->z;

	float sphereX = sPos->x;

	float sphereY = sPos->y;

	float sphereZ = sPos->z;

	float dx = rayDir->x;

	float dy = rayDir->y;

	float dz = rayDir->z;

	a = (dx * dx) + (dy * dy) + (dz * dz);

	b = 2 * (dx * (x  - sphereX) + dy * (y - sphereY) + dz * (z - sphereZ));

	c = (x - sphereX) * (x - sphereX) + (y - sphereY) * (y - sphereY) + (z - sphereZ) * (z - sphereZ) -

  sRadius * sRadius;

	root = b*b - 4*c;

	// no intersection

	if (root < 0) {

  retVal = NO_INTERSECTION;

	}

	if (root == 0) {

  //retVal = -b + sqrtf(root)/2;

  //return -b;                                    //   <-------------

	} 

	

	if (root > 0) {

  root = sqrtf(root);

 omega1 = (-b + root)/(2);

  omega2 = (-b - root)/(2);

  

  // .01 to allow for error

  /*if ((omega1 < 0.01) && (omega2 < 0.01) ){  //   <-------------

  	retVal = NO_INTERSECTION;

  }else if( (omega2 < .01 || omega1 <= omega2) && (omega1 > 0.01)){

  	retVal = omega1;

  }else if( (omega1 < .01 || omega2 <= omega1) && (omega2 > 0.01)){

  	retVal = omega2;

  }

  */

	}

	return retVal;

}

let me know if more code is required. any help would be greatly appreciated!

Jonathan_Cohen · December 30, 2007, 7:31pm

I suspect the complete code would be required to solve the problem. Truthfully, you may have to bang your head on this sort of thing for a number of hours. When you see “Unspecified launch failure”, you should often translate it to “Segmentation fault.” Just as on a more familiar CPU, a bad memory reference will sometimes cause a segmentation fault, and sometimes just quietly trash some memory. So eliminating the launch failure does not guarantee that the code works correctly. Remember also that commenting out lines of code sometimes has a more global effect than you might expect, so cause and effect can be tricky to narrow down.

I assume the code works in emulation mode? (If not, of course you should fix it there with the help of the debugger.) The most frustrating problems are those that occur only on the actual device. This can happen for lots of reasons. Besides synchronization issues, the most common one I see people report is pointer memory space issues. This is confusion by the coder (or the compiler!) about host versus global versus constant versus shared memory spaces. If you are using CUDA 1.1, the compiler will at least warn you if it is too confused (in CUDA 1.0, it used to sometimes do the wrong thing silently).

cronus98 · December 30, 2007, 11:04pm

I suspect the complete code would be required to solve the problem. Truthfully, you may have to bang your head on this sort of thing for a number of hours. When you see “Unspecified launch failure”, you should often translate it to “Segmentation fault.” Just as on a more familiar CPU, a bad memory reference will sometimes cause a segmentation fault, and sometimes just quietly trash some memory. So eliminating the launch failure does not guarantee that the code works correctly. Remember also that commenting out lines of code sometimes has a more global effect than you might expect, so cause and effect can be tricky to narrow down.

I assume the code works in emulation mode? (If not, of course you should fix it there with the help of the debugger.) The most frustrating problems are those that occur only on the actual device. This can happen for lots of reasons. Besides synchronization issues, the most common one I see people report is pointer memory space issues. This is confusion by the coder (or the compiler!) about host versus global versus constant versus shared memory spaces. If you are using CUDA 1.1, the compiler will at least warn you if it is too confused (in CUDA 1.0, it used to sometimes do the wrong thing silently).

[snapback]302311[/snapback]

thanks for the help. unfortunately it works in emulation so i’m pretty stuck. ive been banging my head on this wall for a while now :-(

it’s just weird that the lines that “appear” to be causing the problems are simple returns and assignments to local variables…

nwilt · December 30, 2007, 11:35pm

More context is needed here, since the device function gets inlined by the compiler. Can you post more complete source code, or at least the global function that invokes the device function? When you change return statements of inlined functions, that can have very broad implications for the generated code. When you comment out a return statement, all of the code that the return statement depends on becomes subject to dead code elimination…

Jonathan_Cohen · December 31, 2007, 1:21am

I don’t suppose you’re dividing by 0 in normalizeRay()?

cronus98 · December 31, 2007, 1:24am

I’ve been looking at the examples that come with the sdk, and they seem to avoid functions that return floats. i realize it’s a rather sad attempt to fix it, but hey, we (i) get desparate after things just don’t work. so, i have since changed the function to void and it now takes another argument, which it modifies to the desired value. unfortunately, it still gives the same error.

…sorry for this long code section.

#define BLOCKDIM_X 200

#define BLOCKDIM_Y 2

__device__ inline void getSphereIntersection(float &dist, float3 *rayPos, float3 *rayDir, float3 *sPos, float sRadius) {

	float retVal = 0.0f;

	float a, b, c, root, omega1, omega2;

	//a = (dx * dx) + (dy * dy) + (dz * dz);

	b = 2 * (rayDir->x * (rayPos->x  - sPos->x) + rayDir->y * (rayPos->y - sPos->y) + rayDir->z * (rayPos->z - sPos->z));

	c = (rayPos->x - sPos->x) * (rayPos->x - sPos->x) + (rayPos->y - sPos->y) * (rayPos->y - sPos->y) + 

  (rayPos->z - sPos->z) * (rayPos->z - sPos->z) -	sRadius * sRadius;

	root = b*b - 4*c;

	// no intersection

	if (root < 0) {

  retVal = NO_INTERSECTION;

	}

	if (root == 0) {

  //retVal = -b + sqrtf(root)/2;

  //retVal = -b;                                    //   <-------------

	} 

	

	if (root > 0) {

  root = sqrtf(root);

 omega1 = (-b + root)/(2);

  omega2 = (-b - root)/(2);

  

  // .01 to allow for error

  /*

  if ((omega1 < 0.01) && (omega2 < 0.01) ){  //   <-------------

  	retVal = NO_INTERSECTION;

  } else if( (omega2 < .01 || omega1 <= omega2) && (omega1 > 0.01)){

  	retVal = omega1;

  } else if( (omega1 < .01 || omega2 <= omega1) && (omega2 > 0.01)){

  	retVal = omega2;

  }

  */

	}

	dist = retVal;

}

// GPU threads that calculate pixels

__global__ void raytracer(uchar4 *dest, int width, int height, float3 *camPos, float3 *camLookAt, float3 *camUp,

        int color) {

   const int ix = blockDim.x * blockIdx.x + threadIdx.x;

    const int iy = blockDim.y * blockIdx.y + threadIdx.y;

	int pixel = iy*width + ix;

	// static test sphere

	float r=20;

	float3 s;

	s.x=0;

	s.y=0;

	s.z=0;

	// get this thread's ray

	// "center" ray:

	float directionX = (camLookAt->x - camPos->x) / 10.0f;

	float directionY = (camLookAt->y - camPos->y) / 10.0f;

	float directionZ = (camLookAt->z - camPos->z) / 10.0f;

	float3 rayDir;

	rayDir.x = directionX + (ix - width/2)*.01f;

	rayDir.y = directionY + (iy - height / 2)*.01f;

	rayDir.z = 1;

	

	// normalize

	normalizeRay(camPos, &rayDir);

	

	float intPoint;

	getSphereIntersection(intPoint, camPos, &rayDir, &s, r);

	if (intPoint == 0.0f) {

  dest[pixel].x = 0;

  dest[pixel].y = 0;

  dest[pixel].z = 0;

	} else {

  dest[pixel].x = color;

  dest[pixel].y = 0;

  dest[pixel].z = 0;

	}

}

// this spawns the GPU threads

void runRaytracer(uchar4 *dest, int width, int height, float3 *camPos, float3 *camLookAt, float3 *camUp) {

	dim3 threads(BLOCKDIM_X, BLOCKDIM_Y);

	dim3 grid(width/BLOCKDIM_X, height/BLOCKDIM_Y);

	static int color=150;

//	color=(color+1)%255;

	raytracer<<< grid, threads >>>(dest, width, height, camPos, camLookAt, camUp, color);

	CUT_CHECK_ERROR("CUDA Error\n");

}

cronus98 · December 31, 2007, 1:28am

i just checked using emu, and no, it doesn’t seem to be.

Jonathan_Cohen · December 31, 2007, 1:43am

Are dest, camPos, camLookAt, and camUp pointers to host memory or device memory?

cronus98 · December 31, 2007, 1:59am

they are pointers to host memory. i’m guessing that is causing a problem…? any advice on how to get around that problem? should i be passing by values instead of reference?

Jonathan_Cohen · December 31, 2007, 2:03am

That would be a pretty easy solution.

cronus98 · December 31, 2007, 2:15am

anybody know the max. amount of memory that can be sent to a global function? i was thinking this would exceed the limit (hence the pointers), but i can’t seem to find that number at the moment…

cronus98 · December 31, 2007, 2:22am

forgot to mention i asked this because changng to values didn’t help.

cronus98 · December 31, 2007, 2:28am

scratch that, i’m an idiot. it worked…

thanks everyone for the help. much appreciated!

DenisR · December 31, 2007, 7:59am

I have had the exact same trouble, where I have been looking for 3 days where I had a bug and it turned out I was trying to dereference a pointer to host memory in my kernel function…

I must admit that I have had to buy an extra stock of brown paper bags since I started with Cuda…

Topic		Replies	Views
Unspecifiec launch failure on CUDA_SAFE_CALL(cudaThreadSynchronize()) CUDA Programming and Performance	5	2121	January 27, 2011
unspecified launch failure simple volume initialization fails CUDA Programming and Performance	9	6221	August 26, 2007
Embarassingly beginner question CUDA Programming and Performance	8	3292	May 22, 2009
another unspecified launcher failure CUDA Programming and Performance	1	2899	March 15, 2009
Unspecified launch failure error CUDA Programming and Performance	10	18691	January 6, 2018
Strange "unspecified launch error" from a call to cublas gemm CUDA Programming and Performance	23	2744	January 19, 2019
CUDA error: unspecified launch failure CUDA Programming and Performance	0	1440	May 10, 2011
"unspecified launch failure" - ERROR CUDA Programming and Performance	9	14323	July 19, 2011
Unspecified launch failure CUDA Programming and Performance	2	5708	May 24, 2009
Unspecified launch failure on a nested loop (C - multiple pattern matching algorithm) A double loop CUDA Programming and Performance	5	903	November 22, 2010

Unspecified launch failure strange error, please help

Related topics