VPI <=> CUDA stream and event interop

Hello,

VPI 0.4 features the possibility to wrap a CUDA stream into a VPI stream via vpiStreamCreateCudaStreamWrapper(). The documentation states:

CUDA kernels can only be submitted directly to cudaStream_t if it’s guaranteed that all tasks submitted to VPIStream are finished.

Which I read as: I cannot asynchronously enque some VPI algo on a VPI wrapped CUDA stream, and then immediately launch a CUDA kernel on the underlying CUDA stream. Is that the case?

Guaranteeing that all tasks in the VPI stream are finished could be done synchronously via vpiStreamSync(), however that also blocks the host thread. A better solution would be events, but the interop between VPI and CUDA for events is not implemented yed: vpiEventCreateCudaEventWrapper() will always return VPI_ERROR_NOT_IMPLEMENTED.

Hi,

cudaStreamCreate(&cuda_stream);
cudaStreamCreate(&cuda_stream);

VPIStream stream  = NULL;
vpiStreamWrapCuda(cuda_stream, &stream);

Please use cuda_stream to launch your GPU task and stream for the VPI.

Thanks.