nppiFilterHoughLine_8u32f_C1R returns NPP_CUDA_KERNEL_EXECUTION_ERROR

I am trying to use the function nppiFilterHoughLine_8u32f_C1R, and I cannot get it to work, and I cannot find any examples online. Everything I’ve tried returns an NppStatus of -1000 (NPP_CUDA_KERNEL_EXECUTION_ERROR ). Here is a minimal example.

#include <stdio.h>
#include <cuda_runtime_api.h>
#include <nppi_filtering_functions.h>

int main(void)
	int width = 100;
	int height = 100;

	NppiSize oSizeROI = { width, height };
	NppPointPolar nDelta = { 1.0, 1.0 };
	int nMaxLineCount = 10;
	int hpBufferSize = 0;

	NppStatus ret = nppiFilterHoughLineGetBufferSize(oSizeROI, nDelta, nMaxLineCount, &hpBufferSize);

	Npp8u *pDeviceBuffer = NULL;
	cudaMalloc((void **)&pDeviceBuffer, hpBufferSize);

	Npp8u *pSrc = NULL;
	cudaMalloc((void **)&pSrc, width * height);
    cudaMemset(pSrc, 0, width * height);

	int nSrcStep = width;
	int nThreshold = 10;
	int pDeviceLineCount = 0;

	NppPointPolar *pDeviceLines = NULL;
	cudaMalloc((void **)&pDeviceLines, nMaxLineCount * sizeof(NppPointPolar));

	ret = nppiFilterHoughLine_8u32f_C1R(pSrc, nSrcStep, oSizeROI, nDelta, nThreshold, 
										pDeviceLines, nMaxLineCount, &pDeviceLineCount, pDeviceBuffer);

	printf("nppiFilterHoughLine_8u32f_C1R = %i\n", ret);


	return 0;

I am running on Redhat 7.5, with driver 470.57.02 and CUDA 11.4

I compiled with: gcc main.c -Wall -I/usr/local/cuda/include -o hough -L/usr/local/cuda/lib64 -lcudart -lnppif

What am I doing wrong?

pDeviceLineCount needs to be (i.e. point to) an allocation for an integer in device memory.

That fixed it. Thank you!

