Hello,
My purpose here is to copy data from global memory to shared memory, edit them, and write them back to global memory. The kernel looks as follows:
#define BLOCK_SIZE 32
__global__
void find_defect_coordinates(unsigned char *input, unsigned int width, unsigned int heigth)
{
unsigned int thread_x = threadIdx.x;
unsigned int thread_y = threadIdx.y;
unsigned int glob_x = (blockIdx.x * blockDim.x) + thread_x;
unsigned int glob_y = (blockIdx.y * blockDim.y) + thread_y;
unsigned char correction = 0;
__shared__ unsigned char shInput[BLOCK_SIZE][BLOCK_SIZE];
const int numTiles = width / BLOCK_SIZE;
for (int t = 0; t < numTiles; t++)
{
const int tiledCol = BLOCK_SIZE * t + thread_x;
shInput[thread_y][thread_x] = input[glob_y*width + (t*BLOCK_SIZE + thread_x)];
__syncthreads();
correction = shInput[thread_y][thread_x] - 50;
__syncthreads();
input[glob_y*width + (t*BLOCK_SIZE + thread_x)] = shInput[thread_y][thread_x];
__syncthreads();
}
}
The input here is:
input = is a pointer to the picture data in global memory (its sizeis 8192512sizeof(unsigned char))
width = width of the picture (8192)
height = height of the picture (512)
I am a beginner with CUDA. Please could you let me know what I am doing wrong here? After execution of a kernel I get some kind of noisy image. Thank you a lot for your advices.