Starting Kernel doesnt work with dim3 blockSize

So, I am trying to perform some operations on images.
It works with 1 dimensional grids. So when I call my kernel in my code with:

myKernel<<<width*height/threads,threads>>>(in, out, width, height);

it works fine.

Basically my kernel looks like this:

global void myKernel (uint8_t* in, float3* out, int w, int h) {

int index = blockIdx.x*blockDim.x + threadIdx.x; 
int neighborspan = 1;

if (x >= w * (h - neighborspan) || x < (neighborspan*w)||x % w < neighborspan ||x % w >= (w - neighborspan))



In this form I get my desired result. However, as I am trying to use a 2D grid to address patches in my image more effectively I tried to call the kernel with dim3 variables for the grid and threads:

	unsigned int N = width*height;
	dim3 threadsPerBlock(threads, threads);
	dim3 numBlocks(N / threadsPerBlock.x,N/threadsPerBlock.y);
	myKernel<< <numBlocks,threadsPerBlock >> >(in, out, width, height);

I updated my kernel such that the index calculation is different. I found several versions online which I tried. For example like this:

global void myKernel (uint8_t* in, float3* out, int w, int h) {

   int i = blockIdx.x * blockDim.x + threadIdx.x;
   int j = blockIdx.y * blockDim.y + threadIdx.y;
   if (i > w - 1 || j > h - 1) {



I also tried the indexing in my 1D version and several others and also forced the output to be and rgb color for the output index to be 256,256,256. In case the index is always 0 I also tried to set a fixed amount of pixels to 256,256,256 but whatever I tried the image I receive remains completely black. So I guess its not about the indexing, but about my kernel call with myKernel<< <numBlocks,threadsPerBlock >> > and I guess that, somehow the kernel is called not at all, as I receive no feedback at all, even if I remove the out-of-bounds index check at the start of the kernel, which throws me an error if I do it with the working version.
Am I right with my guess and what did I do wrong here?

Hi Alex,

The first observation that I have is that, if you are segmenting the image in such way we have 2D blocks, the numBlocks should be:

So basically, it partitions the image into blocks. Let’s say that our block is 64x64 and our image is 512x512. Thus, our image will have 8 blocks per row and 8 blocks per column, for a total of 64 blocks. If each block is composed by 64x64=4096 pixels, and we have 64 blocks, it will lead to 64*4096=262,144‬ pixels, which matches with 512x512.

Regarding to the black image, it can be due to several factors:

  1. The GPU is not alive
  2. You are not copying the memory back properly
  3. You are not altering the pixel.

Debugging the kernel with Nsight Tools may be helpful.

Sharing a minimal piece of code which reproduces your error may also be helpful, since the problem is too abstract to be extracted from words.


dim3 numBlocks(width / threadsPerBlock.x, height/threadsPerBlock.y);

This did the trick for me. Thank you!

1 Like