Kernel failure Unspecified launch failure

Hey gentle people,

I have a new problem.

Because I don’t know how many times i have to execute a kernel I tried implementing it like this …

if (NBlocks[i] < 16)

{

	block_size.x = NBlocks[i];

	grid_size.x = 1;

	kernel<<<grid_size, block_size>>>(m_video2, m_motion, indicesGPU, m_width, m_height, block_size.x, 0);

}	

else

{

	block_size.x = 16;

	grid_size.x = NBlocks[i]/16;

	kernel<<<grid_size, block_size>>>(m_video2, m_motion, indicesGPU, m_width, m_height, block_size.x, 0);

	if (NBlocks[i]%16 ==0){

	}

	else

	{

  //printf("Prijs weer in Motion %d met %d\n",i,NBlocks[i]%16);

  block_size.x = NBlocks[i]%16;

  grid_size.x = 1;

  kernel<<<grid_size, block_size>>>(m_video2, m_motion, indicesGPU, m_width, m_height, block_size.x, NBlocks[i]/16);

	}

}

If i uncomment the printf, my output screen gets filled very big, but at least the program runs, if the line stays commented out, I get kernel launch failures…

Anyway to solve this issue?

Kind regards

Niels

This may be caused by queueing too much kernels. Try adding cudaThreadSynchronize() after each kernel invocation.
Also make sure that you’re not exceeding maximum block size for your kernel (check registers and shared memory usage by kernel).