VRam Leak when calling npp functions

when calling functions such as:

nppiCopyWrapBorder_16u_C1R_Ctx
nppiFilterMedian_16u_C1R_Ctx
nppiFilterGauss_32f_C1R_Ctx

for different threads, and for each thread i set the Ctx the same.
I found some some functions return with success but not what i want, then i notice the leak .

with compute-sanitizer, i just got CUDA ERROR #719 (using FilterMedian kernel #2): unspecified launch failure.

i am running my code with RTX A4000, driver 522.06, cuda 11.8 and ecc on

running your code under compute-sanitizer and getting error 719 means you are doing something wrong, perhaps in the arrangement of arguments you are passing to the function.

Given that, declaring a leak or not isn’t sensible, in my opinion. If your code is performing illegal behavior, you should fix that first.

Thanks, but without compute-sanitizer, all functions return npp no error, and the results are correct.
What should I do then?

That seems to contradict:

I have to admit I’m not really sure what your situation is. You’ve given no indication of how you are determining there is a leak, and you seem to be making contradicting statement about whether you are getting the results you expect, or not.

In any event, if you run a particular npp function call, and under compute-sanitizer you get an error 719, then you should double-check all arguments you are passing for correctness.

here is what i did

uint16_t *x,*y;
int h;
int w;
int step;
NppStatus s;
...

# malloc with cuda malloc 
cudaMalloc(x, h*w*sizeof(uint16_t));
cudaMalloc(x, (h+2)*w*sizeof(uint16_t));
s = nppiCopyWrapBorder_16u_C1R_Ctx(x, w*sizeof(uint16_t), {w,h}, y,w*sizeof(uint16_t),{w,h+2}, 1,0,Ctx);

# malloc with nppiMalloc
x = nppiMalloc_16u_C1(w,h,&step);
y = nppiMalloc_16u_C1(w,h+2,&step);
s = nppiCopyWrapBorder_16u_C1R_Ctx(x, step, {w,h}, y,step,{w,h+2}, 1,0,Ctx);

when malloc with cuda, y is what i want,
but malloc with nppi, y is not what i want.
I check the meminfo with cudaMemGetInfo before and after the calling of nppiCopyWrapBorder_16u_C1R_Ctx.

Thanks for your help

I wouldn’t be able to comment further without a complete example.