Unspecified launch failure 4 kernel calls

jordyvaneijk · April 1, 2008, 12:57pm

I have a function that that calls my kernel 4 times. For all my kernel calls I get the error “Unspecified Launch Failure”. Maybe someone can help me on this one. I took some looks at the forum and did a search on the error but couldn’t found anything useful about this error.

I hope someone can help me with this error…

Also the when I run the program in deviceemulation mode I get a segmentation fault and if I write my output to file in deviceemulation I will not see anything come up in the file but when I don’t write teh output to the file I see the output on the screen. Can someone please explain what is happening here?

DenisR · April 1, 2008, 1:01pm

segmentation fault = you are writing to memory that you did not allocate. So you are writing past the end of an array. That is also consistent with an unspecified launch failure (I checked this morning :D )

You can run your emulation code under valgrind I believe to find out where it goes wrong

MisterAnderson42 · April 1, 2008, 1:35pm

If you are running in linux, valgrind is an amazing tool that can help you find where the out of bounds memory write is by running your device emulation binary through valgrind.

Edit: I must be tired this morning, I missed that DenisR had already mentioned valgrind.

DenisR · April 1, 2008, 2:29pm

Although I have never needed to use it myself, what I have heard from Valgrind warrants it being mentioned twice :D

jordyvaneijk · April 1, 2008, 2:50pm

Both of you thank you so much… I have Valgrind installed on my computer but completely forgot about it… But the first thing Denis mentioned got me thinking. I had indeed an memory out of bounds problem… No i fixed it and the program is working again…

One step closer to completion :D:D:D

MisterAnderson42 · April 1, 2008, 3:42pm

I’m glad your fixed your problem.

For the benefit of anyone else who reads this thread: accessing past the end of allocated memory doesn’t always cause an “unspecified launch failure” right away. There are times I’ve made thousands of kernel calls (all writing outside their memory) and then the next call causes an unspecified launch failure.

Oh, and I’ve also seen out of bounds memory accesses trigger the 5s launch timeout with the error message “the kernel launch has timed out and been terminated”.

DenisR · April 1, 2008, 3:44pm

Same thing for me, I also went out of bounds without error, and just scaling up my problemsize made the error appear (as 5sec & unspecified errors)

Linh_Ha · April 1, 2008, 5:08pm

I find it cost almost nothing checking the bound inside CUDA kernel, so i check boundary condition whenever i can

jordyvaneijk · April 2, 2008, 7:41am

Maybe this is a very stupid question but how do you check this?

Thanks again, Jordy

DenisR · April 2, 2008, 11:53am

I would guess something like this:

__global__ int func_with_check ( int *input, int inputsize, float * output, int outputsize)

{

int index = ......;

int in;

if (index < inputsize)

  in = input[index];

else

  return index; // Or something else

if (index_out < outputsize)

  output[index_out] = fsafdsfds;

else

  return -index_out; // Or something else

return 0;

}

So when your returnvalue is non-zero you can find out (through the return value) where you went out of bounds

MisterAnderson42 · April 2, 2008, 2:38pm

How can you return a value from a global function? ;)

Anyways, I find that I often need construtcs like this in my code:

if (index < inputsize)

 in = input[index];

else 

 in = 0;

...

if (index < inputsize)

  out[index] = computed_value

Or, if there are no syncthreads() in the kernel, one can just do if (index < inputsize) return; at the top

Usually, index = blockDim.x * blockIdx.x + threadIdx.x. To keep things simple, lets say my inputSize is 65 and my block size is 64. The calculated index will go up to 127 and I can’t be accessing past the end of my array hence the need for the check (just without the return value).

I have a LOT of kernels like this, and sometimes forget the check: hence my extensive experience with kernels writing past the end of arrays :) Although, now I have my build system setup to compile the whole thing in emulation and then run all the unit tests through valgrind to catch these kinds of errors before they drive me insane.

However, while I add checks where they are needed because of the “tail” block, I do not add a check to every single memory write. That would be a little tedious, and sometimes bad writes are due to a bug in the index location. I’d rather have the system crash because of a bad write (that can be found with valgrind) than get absolutely no output written to an array and be left wondering why.

DenisR · April 2, 2008, 5:26pm

Aargghh, that is only possible from device functions offcourse… Just started to use device functions myself, had no need earlier.

Topic		Replies	Views
Random Launch Failure CUDA Programming and Performance	2	1280	March 1, 2010
Unspecified launch failure CUDA Programming and Performance	2	5729	May 24, 2009
kernel not executed, profiler reports all-zeros CUDA Programming and Performance	18	11075	December 2, 2008
unspecified launch failure kernel fails sometimes not everytime CUDA Programming and Performance	6	6121	February 2, 2010
Other causes of Unspecified Launch Failues CUDA Programming and Performance	2	2574	May 15, 2010
Need help with cuda error: "unspecified launch failure" CUDA Programming and Performance	0	922	July 28, 2011
cudaSafeCall() Runtime API error in file <main.cu>, line 76 : unspecified launch failure I am CUDA Programming and Performance	2	11099	July 6, 2009
Unspecified launch failure CUDA Programming and Performance	2	5629	August 15, 2008
Unspecified launch failure and garbage values: Please help CUDA Programming and Performance	4	4419	July 5, 2010
Unspecified launch failure Legacy PGI Compilers	1	2839	October 6, 2010

Unspecified launch failure 4 kernel calls

Related topics