So, I am trying to perform some operations on images.
It works with 1 dimensional grids. So when I call my kernel in my code with:
myKernel<<<width*height/threads,threads>>>(in, out, width, height);
it works fine.
Basically my kernel looks like this:
global void myKernel (uint8_t* in, float3* out, int w, int h) {
int index = blockIdx.x*blockDim.x + threadIdx.x;
int neighborspan = 1;
if (x >= w * (h - neighborspan) || x < (neighborspan*w)||x % w < neighborspan ||x % w >= (w - neighborspan))
{
return;
}
doStuff();
}
In this form I get my desired result. However, as I am trying to use a 2D grid to address patches in my image more effectively I tried to call the kernel with dim3 variables for the grid and threads:
unsigned int N = width*height;
dim3 threadsPerBlock(threads, threads);
dim3 numBlocks(N / threadsPerBlock.x,N/threadsPerBlock.y);
myKernel<< <numBlocks,threadsPerBlock >> >(in, out, width, height);
I updated my kernel such that the index calculation is different. I found several versions online which I tried. For example like this:
global void myKernel (uint8_t* in, float3* out, int w, int h) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
if (i > w - 1 || j > h - 1) {
return;
}
doStuff();
}
I also tried the indexing in my 1D version and several others and also forced the output to be and rgb color for the output index to be 256,256,256. In case the index is always 0 I also tried to set a fixed amount of pixels to 256,256,256 but whatever I tried the image I receive remains completely black. So I guess its not about the indexing, but about my kernel call with myKernel<< <numBlocks,threadsPerBlock >> > and I guess that, somehow the kernel is called not at all, as I receive no feedback at all, even if I remove the out-of-bounds index check at the start of the kernel, which throws me an error if I do it with the working version.
Am I right with my guess and what did I do wrong here?