Causes of unspecified launch failure

chrismc · July 8, 2009, 5:09pm

I am just about able to allocate memory for an algorithm for processing 6 million particles on one C870.

The algorithm transfers chunks of particle data from host to device, and when the size of this chunk is for 8k particles there is approximately 1.5Gb of free memory and the algorithm works fine.

And when I increase the chunk size to accommodate 64k particles there is just under 1.4Gb of free memory on the device but I get an unspecified launch failure for one particular kernel.

This kernel uses just 25 registers (obtained from the use of the --ptaxas flag with nvcc) and with a block size of 64 threads means a block requires 1600 registers (max 8k) and the number of blocks is 1000. (max=?)

The kernel is called kernel<<<numblocks,numthreads>>>() ,which with the numbers above is kernel<<<1000,64>>>()

I would expect this to be well within resource limits, yet I get an unspecified launch failure.

Is the number of blocks too many?
Do I need to spec the grid in more detail?

What else could be causing the failure?

Brocan · July 8, 2009, 5:23pm

If there is a problem in the number of blocks/threads, the error should be something like: “wrong launch params” or something like this.

“Unspecified launch failure” usually means “Segmentation fault”

So, in your kernel you are making a seg fault, wrong pointers, threads trying to access out of range in your arrays, etc…

Nico · July 8, 2009, 5:25pm

Maybe you’re using too much shared memory?

N.

chrismc · July 9, 2009, 7:47am

I doubt it. The sm per thread is very tiny, due to the kernel algorithm, and the compiler usually tells you too much sm is being used.

What about texture mem? Does the --ptxas flag consider texture mem use?

Nico · July 9, 2009, 8:04am

Yes it does, but not if you’re using dynamically allocated shared memory in the kernel…

N.

Letharion · July 9, 2009, 8:13am

My apologies for going off-topic here, I’ll keep it short.

I seem to recall having seen a link to a post that describes how to dynamically allocate shared memory, but I’ve lost it, could you provide and links/insights Nico?

chrismc · July 9, 2009, 9:06am

Can you get an unspecified launch failure from a cudamemcpy? I’ve placed cudaGetLastError after all statements that involve the device and the launch failure always seems to occur at a particular cudaMemcpy statement but in different iterations of the loop. I have placed dummy cudaGetLastError statements after the previous kernel call to catch any delay in the kernel reporting an error but none is reported so I am assuming that the launch failure is the cudaMemcpy (which isn’t a kernel).

Nico · July 9, 2009, 9:10am

The code looks something like this:

[codebox]

global void kernel() {

extern shared float shared;

…

}

int main(int argc, char** argv) {

…

//This will give you shared memory storage for 32 floats

kernel<<<512,32,32*sizeof(float)>>>();

…

}

[/codebox]

N.

Nico · July 9, 2009, 9:17am

I would think that cudaMemCpy returns a different kind of error. Are you sure you’re not trying to dereference a pointer to host memory in your kernel function, that’s a typical cause for unspecified launch failures.

N.

Topic		Replies	Views
Unspecified launch failure CUDA Programming and Performance	2	5598	August 15, 2008
"unspecified launch failure" - ERROR CUDA Programming and Performance	9	14323	July 19, 2011
How to check GPU kernel's error? CUDA Programming and Performance	7	4844	January 22, 2019
Unspecified Launch Failure Memory Read CUDA Programming and Performance	7	552	August 30, 2018
Couldn't write a value to shared memory Error: "unspecified launch failure" CUDA Programming and Performance	6	1314	June 2, 2010
"unspecified launch failure" but "No CUDA-MEMCHECK" CUDA Programming and Performance	7	7528	January 8, 2016
Other causes of Unspecified Launch Failues CUDA Programming and Performance	2	2547	May 15, 2010
Unspecified launch failure CUDA Programming and Performance	9	1107	November 5, 2010
using cudaMalloc and cudaFree within a loop unspecified launch failure! CUDA Programming and Performance	21	37712	April 23, 2009
Maximum memory allocation size CUDA Programming and Performance	7	16763	January 24, 2012

Causes of unspecified launch failure

Related topics