Hello,
I am trying to use nppiCopy_16s_C1R_Ctx() on Windows10,
and I am seeing memory leak if I call it in side threads.
Pseudo Code:
Main()
for (int i=0; i < 1000; i++) {
cudaMallocPitch() // Allocate src buffer
cudaMallocPitch() // Allocate dest buffer
cudaStreamCreate() // for _Ctx call [A]
Setup_NppStream // Setup NppStreamContext with stream [A]
createThreadNppiCopy() // Create a thread which calls nppiCopy_16s_C1R_Ctx()
cudaStreamDestroy()
cudaFree() // Deallocate src buffer
cudaFree() // Deallocate dest buffer
}
createThreadNppiCopy()
CreateThread() ==> threadNppiCopy()
WaitForSingleObject() // Wait for completion of threadNppiCopy()
CloseHandle() // Close the handle created by CreateThread()
threadNppiCopy()
callNppiCopy()
callNppiCopy()
nppiCopy_16s_C1R_Ctx() // Use stream [A]
cudaStreamSynchronize() // Wait stream [A]
If I call callNppiCopy() from the main thread instead of createThreadNppiCopy(), I don’t see memory leak.
Does NPP support calling from side threads?
My Environment:
- OS: Windows10 Pro 1903 (ja)
- CUDA: 10.1 update2
- NPP: 10.2.0
- NVIDIA Graphics Driver: 431.70
- GPU: Quadro RTX 4000 (TTC mode)
- Compiler: VS2013
Thanks,
naoy4w