Very basic question here, but it doesn’t seem to be directly addressed in the documentation. I’m using the CUDA runtime API and NPP, using async versions of the calls whenever possible. I have a number of places where I allocate a video buffer with cudaMallocPitch(), initialize the contents with a pattern using an NPP Ctx call and then use that buffer as input to a different NPP Ctx routine that composites it with another image. Do I need a synchronization point between the buffer writing and the buffer reading, and if so is cudaStreamSynchronize the right mechanism? Would the answer change if I was using a kernel to do the memory initialization?
Do I need a cudaStreamSynchronize between an async write to GPU memory and an async read of same memory?
what is the “buffer write”? Is that the “initialize the contents…” step?
what is the “buffer read”? Is that the 2nd NPP Ctx call?
If so, and you use Ctx stuff correctly, and you are using the same stream throughout, you should not need a synchronize operation between the two NPP Ctx calls. They should obey stream semantics.
The answer would not change if you were using your own kernel, and you launched that kernel into the same stream that is identified in the NPP Ctx.
You are correct as to what I was trying to say. Thanks!