Other causes of Unspecified Launch Failues

HI

I have an assignment due in like 8 hours. So time is of the essence. Basically I’ve been stuck for a day on one of my device functions.
cudaGetLastError says my kernel failing: Unspecified launch failure.
And although I know this is due to segfaults on the GPU, I just cannot see a reason for this to be happening in my function.
Are there any other causes for this error type ?

It seems to occur if I add a few extra instructions. Is there an instruction limit ? Is it bad to have a while loop? how easy is it to run out of registers (surely the compiler warns of this)…

Also, instead of just exiting, the program takes like 30seconds before it stops… whereas the working program would be instantaneous nearly External Image

any help is welcomed WITH WIDE OPEN ARMS

thanks

rewolf

EDIT: oh and I tried compiling with -G and running cuda-gdb… and even though i get my “unspecified launch failure”, cuda-gdb gives tells be the program exited normally… sigh… WHY DOESN’T IT BREAK? or give me some more information?? sigh…

You could post your function here, of course. Perhaps someone may see something immediately.

The best answer to debugging memory issues: Run in Ocelot’s emulator! That will tell you exactly where the segfault is. Unfortunately you don’t have time to deal with a new diagnostic tool. So I recommend old-school.
Start commenting out different lines to see if you can figure out at least what line is failing, then you can figure out why.

There is an instruction limit but it’s millions of instructions so you don’t need to worry. A while loop is fine, but make sure it terminates!
Register use can be tricky but yes, you’d know at compile or runtime… it wouldn’t cause launch errors.

Last random advice… check for cuda errors after EVERY cuda call. Perhaps a cudamemcpy fails, you don’t check it, you start your kernel, and your kernel tailspins because it’s assuming it has properly copied input data.

thanks very much! very useful tips. I’ve simplified the function a lot instead using approximations in the mean while. I’ll def look into that emulator soon though.

thanks again!