nppsMean_32fc corrupts device memory

I’ve been trying to use nppsMean_32fc, but after invoking it, cudaMemGetInfo reports 0 memory on the device and subsequent device allocs fail.

I’ve reproduced this in the modified boxFilterNPP sample code solution which I have attached in a zip file.

Here is the new code inserted at line 186:

// new code ****************************************************************************
// new code ****************************************************************************
	// test NppsMean for complex val buffer
	int nsize = oSizeROI.width * oSizeROI.height;
	fComplex *pfc_img_host = new fComplex[nsize];
	Npp8u *psrc = oHostSrc.data();
	for(int i = 0; i < nsize; i++)
	{
		pfc_img_host[i] = std::complex<float>((float) psrc[i], 0.0f);
	}

		
	Npp32fc *pfc_img_device;
	cudaMalloc((void **) &pfc_img_device, nsize * sizeof(Npp32fc));
	cudaMemcpy(pfc_img_device, pfc_img_host, nsize * sizeof(Npp32fc), cudaMemcpyHostToDevice);

	size_t mem_free, mem_tot;
	cudaMemGetInfo  (&mem_free, & mem_tot);

	int hpBufferSize, nLength = nsize;//*sizeof(fcomplex);
	nppsMeanGetBufferSize_32fc (nLength, &hpBufferSize);
	cudaMemGetInfo  (&mem_free, & mem_tot);

	Npp8u *pDeviceBuffer; 
	cudaError_t custat = cudaMalloc((void **) &pDeviceBuffer, hpBufferSize);
	if(custat != 0)
		throw("cudaMalloc failed! in file %s, line %s", __FILE__, __LINE__);
	cudaDeviceSynchronize();
	cudaMemGetInfo  (&mem_free, & mem_tot);

	Npp32fc fcMean;
NppStatus stat = nppsMean_32fc((const Npp32fc *)pfc_img_device, nLength, &fcMean, pDeviceBuffer);
	cudaDeviceSynchronize();
	cudaMemGetInfo  (&mem_free, & mem_tot);

	if(stat != NPP_NO_ERROR)
		throw("nppsMean_32fc failed! in file %s, line %s", __FILE__, __LINE__);
	delete [] pfc_img_host;
	cudaFree(pfc_img_device);
	cudaFree(pDeviceBuffer);
	cudaMemGetInfo  (&mem_free, & mem_tot);
// end new code ****************************************************************************
// end new code ****************************************************************************

First off, I have zero experience with NPP. Here are some general recommendations for debugging:

(1) Add status checking to every API call. Things may start to go wrong earlier than currently assumed.

(2) Carefully review the arguments passed to API functions. For example, a casual look at NPP documentation suggests that nLength in nppsMeanGetBufferSize_32fc() should be the size in bytes, so the current code allocates a buffer that is too small, which causes nppsMeanGetBufferSize_32fc() to overwrite other data.

(3) From discussing previous issues with the NPP team that looked like bugs but turned out to be usage issues, I am aware that there is frequently confusion about ROI configuration, for example the required allocations could be larger than the size of the ROI alone suggests.

If after careful checking of your code, you believe there is a bug in NPP, I would suggest filing a bug report via the form linked from the registered developer web site, attaching self-contained repro code.

  1. According to the NPP Library.pdf:
int nLength = 1024;
...
// Compute the appropriate size of the scratch-memory buffer
int nBufferSize;
nppsSumGetBufferSize_32f(nLength, &nBufferSize);
// Allocate the scratch buffer
cudaMalloc((void **)(&pDeviceBuffer), nBufferSize);

and the doc for the function call:

nLength is the number of samples, not the number of bytes.

This is typical of the NppS functions, unlike the NppI functions.

How about running the attached solution which I took great pains in creating for you to reproduce this?

If you’ll notice, I even added status checking calls to determine exactly where the problem occurs.

To avoid misunderstandings: This forum is designed to as a platform for a programming community. It is not intended to be a bug reporting channel, that function is served by the bug reporting form I mentioned.

My participation in this forum happens “on the side”, next to my regularly scheduled work. I rarely have time to look into issues in detail, and instances where I do look into details have to do with functionality that I have some familiarity with. I am not familiar with NPP.

It appears I misunderstood the description of nppsMeanGetBufferSize_32fc() when I took a quick look at the documentation. Sorry about that, I did not mean to mislead.

I think filing a bug report would be the best way to get to the bottom of this issue. The reporting form is linked from the registered developer website. If you are not yet a registered developer, it is easy to sign up and the usual turn-around time is one business day.