Hello,
can somebody help. I am writing my code to loop through every pixel of RGB nad convert it to gray scale but it doesn’t work!. It compiles but the results are not as expected.
I have defined
#define DIM 512
dim3 grid(DIM,DIM);
and my kernel call looks like this:
gpuConvert <<<grid, 1>>>(dev_ptrout, dev_ptrin);
My function looks like this:
global void gpuConvert(unsigned char *b, const unsigned char *a)
{
// Calculate the index of the pointers based on pixel location for each thread
int x = blockIdx.x; // current row
int y = blockIdx.y; // current column
int dim= gridDim.x;
// Perform the convertion
int index = x + y*dim; // current pixel index
b[index] = (a[index]+ a[index+1]+ a[index+2])/3.0f;
}
Am I missing something?
Please help! I am lost
Apparently your input pixel format is RGB unsigned char. That means you have to account for the 3 bytes per pixel in your pointer calculations.
What about your output pixel format? Is it also RGB 3 bytes unsigned char? Or something else?
If it is a 3-byte output format, something like this might work:
int index = x + ydim; // current pixel index
int iindex = 3index; // compute byte offset
int oindex = 3*index;
unsigned char temp = (a[iindex]+ a[iindex+1]+ a[iindex+2])/3.0f;
b[oindex] = temp; // assuming 3 byte output format
b[oindex+1] = temp; // assuming 3 byte output format
b[oindex+2] = temp; // assuming 3 byte output format
if it’s a 1-byte output format, something like this:
int index = x + ydim; // current pixel index
int iindex = 3index; // compute byte offset
int oindex = index;
unsigned char temp = (a[iindex]+ a[iindex+1]+ a[iindex+2])/3.0f;
b[oindex] = temp; // assuming 1 byte output format
There’s a lot of other comments that could be made, but keep it simple until you get your code to work.
a quick question: I used the 1-byte output you suggested. It compiles but the image appears only gray at the top (about 1/4 of the window) and the rest is like distorted…
when this happen, what should I look to fix first? the statements inside the gpu kernel OR the kernel call by changing the parameters inside the <<<>>> ? Or something else?
I am lost here!
Based on the code you sent me privately, I would look at the sizes of the arrays you are allocating.
Thanks
you mean to change the sizeof(unsigned char) to size of(something else)?
I already tired this if this is what you mean
:(
Right now you have some code like this in your convertwithCUDA function:
// Allocate the input pointer on the device.
cudaStatus = cudaMalloc( (void**) & dev_ptrin, sizeof(unsigned char) * input.rows * input.cols);
// Allocate the output pointer on the device.
cudaStatus = cudaMalloc( (void**) & dev_ptrout, sizeof(unsigned char) * grayImage.rows * grayImage.cols);
Does it make sense to use sizeof(unsigned char) for both?
How many unsigned char does it take to represent a pixel in the input image?
How many unsigned char does it take to represent a pixel in the output image?
Even though you made the change in the kernel, you’re not really grasping that the input image has 3 bytes per pixel and the output image has 1 byte per pixel.
And you’ll need to modify your cudaMemcpy operations accordingly, as well.
I try to change the sizeof from char to long in the following statements:
global void gpuConvert(unsigned char *b, const unsigned long *a)
only the second parameter here
unsigned long temp = (a[offset]+ a[offset+1]+ a[offset+2])/3.0f;
this statement is inside the kernel
unsigned long *dev_ptrin;
cudaStatus = cudaMalloc( (void**) & dev_ptrin, sizeof(unsigned long ) * input.rows * input.cols);
cudaStatus = cudaMemcpy(dev_ptrin, input.data, sizeof(unsigned long ) * input.rows * input.cols, cudaMemcpyHostToDevice);
cudaStatus = cudaMemcpy(dev_ptrin, input.data, sizeof(unsigned long ) * input.rows * input.cols, cudaMemcpyHostToDevice);
but it doesn’t work. I probably (more possible) don’t follow your exact directions…that;s why…but as I said I am a self learner and I am already learning a lot from you…
ok should I change to float or int?
also should I change the following
Mat gpuConvertedImage (inputImage.rows, inputImage.cols, CV_8UC1)
to
Mat gpuConvertedImage (inputImage.rows, inputImage.cols, CV_32FC1)
?
Any more help greatly appreciated.
Or, does anyone know a good link to start with regarding my issue and to make more clear everything?
Try this blog post. It includes a completely worked sample code for what you are trying to do.
[url]http://www.programmerfish.com/how-to-write-a-custom-cuda-kernel-with-opencv-as-host-library/[/url]
Thank you. I will check it.
:)