NPP_STEP_ERROR when using Cross correlation function from NPP


I am using nppiCrossCorrValid_NormLevel_8u32f_C1R to calculate the correlation between a source image and a template image. I keep getting NPP_STEP_ERROR.

The step I am passing is neither zero nor is it less than (width of ROI * pixel size). Since the input images are 8bit, I am assuming the pixel size used for checking the step error condition should be 1 (byte).

Here is my code and the parameters I am passing.

srcStep = 1500;
srcRoiSize.width = 384;
srcRoiSize.height = 69;
tmplStep = 1500;
tmplRoiSize.width = 301;
tmplRoiSize.height = 61;
dstStep = 332;// Since the result is 83x8, the width is 83 and since the type is float, step = width*4

int nBufferSize=0;
NppStatus stat = nppiValidNormLevelGetBufferHostSize_8u32f_C1R(srcRoiSize, &nBufferSize);
cudaMalloc((void **)(d_Buffer), nBufferSize);

stat = nppiCrossCorrValid_NormLevel_8u32f_C1R(d_volume+srcOffset+startSrcImage, srcStep, srcRoiSize,
d_volume+tmplOffset+startTmplImage, tmplStep,tmplRoiSize,

Any help would be appreciated!