nppiFilterGauss_8u_C3R,nppiFilterGaussAdvanced_8,nppiFilterGaussBorder_8u_C3Ru_C3R

Several color images were tested using Gaussian functions, with sizes of 4096 * 10000 or 8192 * 10000, which did not work properly. However, when the image was slightly smaller, such as 1200 * 2000, it could work properly;

Graphics card: NVIDIA GeForce RTX 2070 Super

Cuda:11.6

Some suggestions:

  1. update to the latest available CUDA version (12.8.1, currently). Bugs get fixed all the time.
  2. describe what you mean by “did not work properly”
  3. provide a complete test case

2、The result image is the same as the original image, without any changes changes
3、extern “C” int nppi_GaussFilter(unsigned char* pSrcData, unsigned char* pDstData, int iHeight, int iWidth, int iChannel)
{
NppStatus t_NppStatus;
int srcElements = iHeight * iWidth * iChannel;
int dstElements = iHeight * iWidth * iChannel;

// target data on device
unsigned char* dstDevData;
cudaMalloc((void**)&dstDevData, sizeof(Npp8u) * dstElements);
// source images data on device
unsigned char* srcDevData;
cudaMalloc((void**)&srcDevData, sizeof(Npp8u) * srcElements);
cudaMemcpy(srcDevData, pSrcData, sizeof(Npp8u) * srcElements, cudaMemcpyHostToDevice);

int iSrcStep = iWidth * iChannel;
NppiSize oSizeROI;
oSizeROI.width  = iWidth;
oSizeROI.height = iHeight;

t_NppStatus = nppiFilterGauss_8u_C3R(srcDevData, iSrcStep, dstDevData, iSrcStep, oSizeROI, NppiMaskSize::NPP_MASK_SIZE_11_X_11);

cudaMemcpy(pDstData, dstDevData, sizeof(Npp8u) * dstElements, cudaMemcpyDeviceToHost);

cudaFree(srcDevData);
cudaFree(dstDevData);
return t_NppStatus;

}

You haven’t properly offset your image to allow for filter mask positioning.

You cannot run a mask on every input pixel. The mask will cross over the image boundary, resulting in illegal access. Your code is broken.

I give an example of image offsetting in the linked example I provided. Yes, I understand it appears to work in some cases. Run any cases you like with compute-sanitizer if you want more info. Your posted code throws errors in compute-sanitizer even for image size of 1000x1000.

You may also wish to use proper CUDA error checking.

It seems like the mask size offsetting topic was previously pointed out to you.