I have a GpuMat of OpenCV type created as
cv::cuda::GpuMat d_im(h_im.size().height, h_im.size().width, CV_8UC1);
that I am doing some image processing operations using opencv::cuda and then according to the OpenCV documentation I have tried to pass it
directly to the kernel function below as :
Kernel_func<<<grid_size, block_size, 0, stream>>>( d_im.ptr<uint8_t>(), output);
but I got wrong results.
However, it was okay if the new d_im is downloaded from GPU to CPU and then copy it again to the GPU by cudaMemcpy as in this code snippet below (with no problems). I know this is not okay to do.
CUDA_CHECK_RETURN(cudaMemcpyAsync(input, h_im_new.ptr<uint8_t>(), sizeof(uint8_t)*size, cudaMemcpyHostToDevice,stream1));
My global function prototype is :
__global__ void Kernel_func(const uint8_t *input, const uint8_t *output);
I am not sure what is wrong in this case, please anyone had similar issue or any suggestions. Thanks for your help.