nppiMalloc_8u_C1 returning 0

Hi all,

I am a newbie and am trying to use NPP for some image processing functions. I tried running the box filter sample. The code compiled fine but it crashes during execution with the following error:

Program error! The following exception occurred:
c:\cuda\npp\sdk\common\utilnpp\imageallocatorsnpp.h:75: pResult != 0 assertion faild!
Aborting.

I also created another test program. There also I get the same problem.

Here is the code:

int stepBytesKnl=0;
Npp32s *d_kernel = nppiMalloc_32s_C1(kernelwidth, kernelheight, &stepBytesKnl);

Here d_kernel is undefined and steBytesKnl is zero even after the nppiMalloc function had returned.

Any help would be greatly appreciated.
Thanks!

I am using v3.2. I am completely stuck at this point. Please help!!