nppiCopyWrapBorder_16u_C1R_Ctx
nppiFilterMedian_16u_C1R_Ctx
nppiFilterGauss_32f_C1R_Ctx
…
for different threads, and for each thread i set the Ctx the same.
I found some some functions return with success but not what i want, then i notice the leak .
with compute-sanitizer, i just got CUDA ERROR #719 (using FilterMedian kernel #2): unspecified launch failure.
i am running my code with RTX A4000, driver 522.06, cuda 11.8 and ecc on
running your code under compute-sanitizer and getting error 719 means you are doing something wrong, perhaps in the arrangement of arguments you are passing to the function.
Given that, declaring a leak or not isn’t sensible, in my opinion. If your code is performing illegal behavior, you should fix that first.
I have to admit I’m not really sure what your situation is. You’ve given no indication of how you are determining there is a leak, and you seem to be making contradicting statement about whether you are getting the results you expect, or not.
In any event, if you run a particular npp function call, and under compute-sanitizer you get an error 719, then you should double-check all arguments you are passing for correctness.
uint16_t *x,*y;
int h;
int w;
int step;
NppStatus s;
...
# malloc with cuda malloc
cudaMalloc(x, h*w*sizeof(uint16_t));
cudaMalloc(x, (h+2)*w*sizeof(uint16_t));
s = nppiCopyWrapBorder_16u_C1R_Ctx(x, w*sizeof(uint16_t), {w,h}, y,w*sizeof(uint16_t),{w,h+2}, 1,0,Ctx);
# malloc with nppiMalloc
x = nppiMalloc_16u_C1(w,h,&step);
y = nppiMalloc_16u_C1(w,h+2,&step);
s = nppiCopyWrapBorder_16u_C1R_Ctx(x, step, {w,h}, y,step,{w,h+2}, 1,0,Ctx);
when malloc with cuda, y is what i want,
but malloc with nppi, y is not what i want.
I check the meminfo with cudaMemGetInfo before and after the calling of nppiCopyWrapBorder_16u_C1R_Ctx.