Consider the following usage pattern:
- Create CUDA stream, create VPI stream wrapping that CUDA stream
- Allocate CUDA memory (e.g. via OpenCV’s
cv::cuda::GpuMat
) , create VPI image wrapping that memory - Use newly wrapped VPI image as an output location for some VPI function (that is, fill the data in that image)
- Do some processing on raw CUDA memory (e.g. memcpy to host, do something, memcpy back). Do CUDA synchronization of stream.
- Use same wrapped VPI image from (2) as an input to some VPI function.
Currently I’m facing various segfaults when that pattern is running in multithreaded environment. However, if I’ll do VPI synchronization on (4), everything seem to work more or less correctly (at least it does not segfault).
So, the question is: will VPI synchronization on wrapped stream also do synchronization on underlying CUDA stream?
Related question: will indirect VPI synchronization via vpiSubmitHostFunctionEx
have same semantics as-if CUDA stream was also synchronized via that call?