Does vpiStreamSync on wrapped CUDA stream synchronize underlyng CUDA stream?

Consider the following usage pattern:

  1. Create CUDA stream, create VPI stream wrapping that CUDA stream
  2. Allocate CUDA memory (e.g. via OpenCV’s cv::cuda::GpuMat) , create VPI image wrapping that memory
  3. Use newly wrapped VPI image as an output location for some VPI function (that is, fill the data in that image)
  4. Do some processing on raw CUDA memory (e.g. memcpy to host, do something, memcpy back). Do CUDA synchronization of stream.
  5. Use same wrapped VPI image from (2) as an input to some VPI function.

Currently I’m facing various segfaults when that pattern is running in multithreaded environment. However, if I’ll do VPI synchronization on (4), everything seem to work more or less correctly (at least it does not segfault).

So, the question is: will VPI synchronization on wrapped stream also do synchronization on underlying CUDA stream?

Related question: will indirect VPI synchronization via vpiSubmitHostFunctionEx have same semantics as-if CUDA stream was also synchronized via that call?


Yes. it will.
Do you meed any error after using vpiStreamSync?


No, but I want to know whether it’s just a coincidence or syncing VPI stream explicitly synchronizes CUDA stream.

Same applies to vpiSubmitHostFunctionEx, as I use it to do stream synchronization in program that is built around boost::fibers.