nppiFilterRow_8u_C1R giving error -1000(KERNEL_EXECUTION_ERROR)

Edited the post for a fresh issue.
I am using nvidia nsight application to rotate and blur images. I am using the NPP libraries for the same,
Since i want to perform convolution among rows only, I am using nppiFilterRow for the same.

npp::ImageNPP_8u_C1 oDeviceDst(768 , 768);
Npp32s masksize = (Npp32s)KERNEL_LENGTH;
Npp32s anchor = (Npp32s)KERNEL_RADIUS;
NppiSize SzROI ={(int)oDeviceDst_gauss.width(),(int)oDeviceDst_gauss.height()};
std::cout << " anchor " << anchor << std::endl;
std::cout << " masksize " << masksize << std::endl;

NppStatus status2 = nppiFilterRow_8u_C1R(, oDeviceDst.pitch(),,
oDeviceDst_gauss.pitch(), SzROI, d_Kernel, masksize , anchor, 1);

 std::cerr << "status of blur is" << status2 << std::endl;

Error: status of blur is -1000
KERNEL_LENGTH is 2*radius +1;
kernel_radius is 8;
i have taken ndivisor=1;
Upon execution i am getting -1000 status.
I edited my post so that i wont have to comment.

I am having issues to follow what exactly the ask is here. Are you saying that your application by itself works fine, and when profiling with Nsight Compute you see the stated program error? Or does the error occur even with the app itself, when run “remotely”? Or are you using Nsight Eclipse Edition to run the app, and that’s when the error happens?


I edited the question to focus on the new error which i am getting after updating the code. The image allocator issue seems to go when i use the kernel as a device kernel and not a host one. Kindly help me out with the new issue.
Moreover, i am running the code remotely on a jetson using nvidia NSIGHT.


I don’t think this is related to Nsight Compute. If you feel this is an Nsight Eclipse Edition problem (i.e. the if the problem only occurs when running the application under Nsight Eclipse Edition), please post this in

If you believe this is an issue with your usage of NPP, please move your question to

For Jetson specific questions, you can use