CUDA crashes for no apparent reason

hi all,

i’ve been banging my head against this all day…my cuda program keeps crashing and i cannot figure out why. if i take out (seemingly) random lines it works fine. also, running emudebug runs everything fine.

for example:

float specular = dotProduct(&ref, &rayDir);

if (specular != 0) {

	specular=0;

}

specular = pow(specular, KS_EXP);

unsigned char red = (unsigned char)(color*specular*KS_DEFAULT);

crashes. while:

float specular = 0;

if (specular != 0) {

	specular=0;

}

specular = pow(specular, KS_EXP);

unsigned char red = (unsigned char)(color*specular*KS_DEFAULT);

works fine. also,

float specular = dotProduct(&ref, &rayDir);

specular=0;

if (specular != 0) {

	specular=0;

}

specular = pow(specular, KS_EXP);

unsigned char red = (unsigned char)(color*specular*KS_DEFAULT);

works as well.

from this one might think dotProduct() is the problem:

__device__ inline float dotProduct(float3 *r1, float3 *r2) {

	float dot = (r1->x*r2->x) +(r1->y*r2->y) +(r1->z*r2->z);

	return dot;

}

i call this function else where, and it seems to work fine. also, i’ve verified all variables contain “valid” data. is there a maximum execution time, or size, for a kernel? ANY ideas would be helpful.

float specular = dotProduct(...)

if (specular != 0)

  specular = 0;

specular = pow(specular, KS_EXP);

unsigned char red = (unsigned char)(color*specular*KS_DEFAULT);

So, I dont find why you need to call “dotProduct”, if anyway “specular” is going to be assigned to zero. also, “pow” is always passed a first argument of 0 which anyway is going to give you back only 0. And, “red” is again going to be “0” again always. So, What does this code actually do??

If possible paste the “PTX” assembly code for this code snippet. We can see what is going on. TO generate PTX code, use “-keep” NVCC option and paste only the relevant section here.

btw, Therez this cute little icon :argh: to express this… :-)

:( Unverified emudebug can’t be taken as a measure of correctness. I was writing a gaussian blur function to learn CUDA, and emudebug SEEMED to run it right while running it on-card gave obviously bogus results.

Turns out that there was a mistake in my code, but running the emudebug version only gave subtly wrong results – the emulation added prettier errors to the calculation than the device itself did. What I mean is that I’ll be checking CUDA results bit-for-bit against a CPU-based implementation, next time things are going weird. :magic:

my apologies…the setting to zero was my feeble attempt at figuring out what was going on. the “correct” code is:

float specular = dotProduct(...);

specular = pow(specular, KS_EXP);

unsigned char red = (unsigned char)(color*specular*KS_DEFAULT);

dest[index].x = red;

“dest” is a uchar4* that the host maps to an opengl pbo.

i will try to post the assembly code today…thanks for the idea.

alright, i haven’t read assembly for a while, so i modified the code to make it easier to find where its assembly is:

float specular = dotProduct(&ref, &rayDir*888.0f;

//specular = pow(specular, KS_EXP);

unsigned char red = (unsigned char)(color*diffuse*KD_DEFAULT + color*specular*KS_DEFAULT);

dest[pixel].x = red;

a note about the pow()…when it was not commented out i got several 1.#INF and NAN. taking out pow() fixed this, but the program still crashes. i’m so confused!

the PTX code:

$Lt_0_176:

	.loc	14	249	0

	mul.f32  $f187, $f173, $f162;  //  

	mul.f32  $f188, $f176, $f164;  //  

	mul.f32  $f189, $f179, $f166;  //  

	add.f32  $f190, $f187, $f188;  //  

	add.f32  $f191, $f189, $f190;  //  

	add.f32  $f192, $f176, $f176;  //  

	mul.f32  $f193, $f192, $f191;  //  

	sub.f32  $f194, $f164, $f193;  //  

	mul.f32  $f195, $f94, $f194;  	//  

	add.f32  $f196, $f173, $f173;  //  

	mul.f32  $f197, $f196, $f191;  //  

	sub.f32  $f198, $f162, $f197;  //  

	mad.f32  $f199, $f17, $f198, $f195;	//  

	add.f32  $f200, $f179, $f179;  //  

	mul.f32  $f201, $f200, $f191;  //  

	sub.f32  $f202, $f166, $f201;  //  

	mad.f32  $f203, $f28, $f202, $f199;	//  

	mov.f32  $f204, 0f445e0000;    //  888

	mul.f32  $f205, $f203, $f204;  //  

	mul.f32  $f206, $f159, $f205;  //  

	mov.f32  $f207, 0f3f333333;    //  0.7

	mul.f32  $f208, $f206, $f207;  //  

	mov.f32  $f209, 0f3f000000;    //  0.5

	mov.f32  $f210, 0f3f000000;    //  0.5

	mad.f32  $f211, $f191, $f210, $f209;	//  

	mul.f32  $f212, $f159, $f211;  //  

	mov.f32  $f213, 0f3f333333;    //  0.7

	mad.f32  $f214, $f212, $f213, $f208;	//  

	cvt.rzi.u32.f32  $r46, $f214;  //  

	cvt.u8.u32  $rh15, $r46;      	//  

	.loc	14	250	0

	mov.u16  $rh16, $rh15;        	//  

	.loc	14	252	0

	mov.u16  $rh17, 0;            	//  

	st.global.v2.u8  [$r29+0], {$rh16,$rh17};	//  

	.loc	14	253	0

	mov.u16  $rh18, 0;            	//  

	st.global.u8  [$r29+2], $rh18;	//  id:781