Im facing a very strange problem with cuda, when processing an image.
I start the Kernel, acessing each pixel of the 450x300 dimension gray scale picture with <<<image_height, image_width>>> parameters
the number of threads / block is < 512. The result is that the array, initialized on host and gpu with 450*300, only contains a computed array of 300x300,whereas the other values are black.
if i blow up the dimensions to 450x450 the picture im getting has reproduction errors at the bottom of course, but the width of the piture will be also fully computed
Obvioulsy the kernel takes not the arguments as i want it 1 block containng on row of my input image.
Instead, a block is alligned vertically
How can i get my blocks aligned in a native way for a picture
300 blocks one block containing 450 threads??