Simple code causes cudaError_enum

Hi all-

I’m new here and was making great progress with CUDA, transferring data back and forth building towards more complex computing until I hit the following problem. I’m on WinXP/8800GS if that matters. The following snippet will cause the cudaError_enum (on each thread I think) if any of the lines with //E are put back in. There is another message that comes up in the stdout window but I can’t catch that. This only happens on the actual version, but works fine in EMU.

Before I did this version, I had the routine use a pointer to return the intersected point, but that also caused the same exception. The explanation of auto vars and function return value handling isn’t explained in the CUDA refs so I’m not sure how these are handled exactly. Do they have to allocated via cudaMalloc instead?

This routine is used within another, but there isn’t much stack depth so I don’t think that’s the problem. THANKS!

struct Intersection {
float3 p;
Bool isValid;

device host struct Intersection lineplaneIntersect(const float3 linep1, const float3 linep2,
const float3 planept, const float3 planenormal)
{ // return true if the intersection exists, else false
struct Intersection intersection;

float udem = dot3D(planenormal, linep2-linep1 );
if( isSmall(udem) )	// plane and line parallel
{	// define intersect as all -1

//E intersection.p = make_float3(-1.0f,-1.0f,-1.0f);
//E intersection.isValid = False;
return intersection; // linep’s COULD be on the plane and should return true???

float unom = dot3D(planenormal, planept-linep1 );

float u = unom / udem;

//E intersection.p = linep1 + u*(linep2 - linep1);
//E intersection.isValid = True;

return intersection;


Maybe your pointer sat on the host memory. By the way, are you calling this function from a kernel? I see ithat it’s built as both a host and a device function.

See my queston above.

Thanks for the response.

The errors are when called from a kernel.

In the case posted, the variable ‘intersection’ is declared as a local variable, so when this runs in a kernel does it automatically reside on the device? The manual isn’t very clear although it refers to local variables being defined in registers, per below. Although other local variables don’t seem to be a problem. What I was alluding to as the earlier version of the function was that I passed a pointer to a local variable to the function and also had issues. This is how I am getting led towards the handling of auto/local variables as being a problem.

Maybe this is a red-herring in that there is something else wrong.

BTW, I’m using VS 2005. Also, each thread is independent.

Can I use C++ try/catch blocks in host code or do I need to use the signal lib to catch exceptions? EDIT: I tried using signal but can’t seem to catch the error so that I can see what appears in the printf screen before the program closes.

Oh, my attempted signal catching code is like this (which doesn;t seem to work):

[codebox]host void leave(int sig) {

    printf("Exception occurred. Hit any key\n");

    char c;

    scanf("%c", &c);




// Program main



main( int argc, char** argv)


// set up signals

(void) signal(SIGINT,leave);

(void) signal(SIGFPE,leave);

(void) signal(SIGABRT,leave);

(void) signal(SIGILL,leave);

(void) signal(SIGSEGV,leave);

(void) signal(SIGTERM,leave);

runTest( argc, argv);

cutilExit(argc, argv);


OK, this thread:
explained how to catch the error on the std out window. Came at an opportune time for me.

So now that I can catch the message it is:

cutilCheckMsg() CUTIL CUDA error: Kernel execution failed in file <>, line 267 : too many resources requested for launch.
Press any key to continue . . .

I think I have the grid and block parameters wrong for the kernel launch. I think I can figure it out now …