"CUDA grid launch failed" - Is there any rule for grid and block size?

Hello all,

I got a error message from NEXUS, “CUDA grid launch failed”.
When I use
dim3 dimBlock_c(32,1) ;
dim3 dimGrid_c(60,1);
, the kernel works well.

However, when I use
dim3 dimBlock_c(64,1) ;
dim3 dimGrid_c(30,1);
, I got a error message from NEXUS, “CUDA grid launch failed”.

So I try to launch a empty kernel that do nothing in the kernel with various configurations of the grid size and block size.
Depending on the configuration.
For example,
dim3 dimBlock_c(512,1) ;
dim3 dimGrid_c(16,1);
is work. but
dim3 dimBlock_c(128,1) ;
dim3 dimGrid_c(30,1);
is doesn’t work.
And some configuration makes my PC shut down.

Is there anyone who suffer similar problem?
Is there any rule for grid and block size?

I think it is a different problem, nothing to do with block and grid size.

Hopefully someone else can suggest what the problem is.

Rules for grid and block size are… ( from Programming guide Version 2.3, section A.1.1 )

  • The maximum number of threads per block is 512;
  • The maximum sizes of the x-, y-, and z-dimension of a thread block are 512, 512,
    and 64, respectively;
  • The maximum size of each dimension of a grid of thread blocks is 65535;

Could you publish your code where you launch your empty kernel and it crashes?

The code is simple

dim3 dimBlock_o(16,16);

dim3 dimGrid_o(NUM_BLOCK_TRAVERSE,1);

// This kernel do something and the result are stored in "resutl"

DoTest<<<dimGrid_o, dimBlock_o>>> (result, ... );

dim3 dimBlock_c(64,1);

dim3 dimGrid_c(30,1);

emptyKernel<<<dimGrid_c, dimBlock_c>>> ( result, numResult);

__global__ void countOverlap ( CUDA_result* result, int* numOverlap)


  // empty


You cannot have 0 in any dimention :)
You launch 64x0=0 threads per block!

Oh, sorry External Image

I miss type that.

I used (64, 1) and it doesn’t work.

What is possible reason?


Make sure you have devdriver 270.81 installed.
I used to have the same issue, and noticed I was using NSight 2.0 with devdriver 285.67 installed.
the devdriver is too new to be compatible with old version of NSight.
Simply switch back to old driver or upgrade NSight 2.1 would solve the problem.

The solution for the ‘grid launch failed’ error when using Parallel NSight is definitely making sure you’re using the right drivers. For the currently stable 2.0 version, this is 270.81 drivers. For the 2.1 RC2 this is 285.86 drivers. Cheers!