vpiStreamCreateWrapperCUDA with cuda kernels

vladimir32 · August 8, 2024, 11:00pm

I want to write some cuda kernels to work with the output of a VPI pipeline.

I was thinking of creating a cudaStream_t cuda_stream. Then wrapping it with a vpi stream using vpiStreamCreateWrapperCUDA. Then doing something like this:

…vpi pipeline…
vpiSubmitStereoDisparityEstimator(vpi_stream, CUDA backend, …)
myCudaKernel<<<cuda_stream, …>>>(…)

However, in the documentation, it says “CUDA kernels can only be submitted directly to cudaStream_t if it’s guaranteed that all tasks submitted to VPIStream are finished.”. Does that mean I have to call vpiStreamSync before executing my cuda kernel?

If so, what is the point of wrapping the cuda stream in a vpi stream? If I have to call stream_sync anyways, I might as well create a new VPI stream.

Or is “guaranteed that all tasks submitted to VPIStream are finished” satisfied because the previous VPI operation I’m doing uses the CUDA backend, so it’s implicitly guaranteed to be finished before the cuda kernel starts executing?

Also a follow-up question. If I call vpiStreamSync on a wrapped cuda stream, does that also call cudaStreamSynchronize under the hood? Or if I call cudaStreamSynchronize, is that the same as calling vpiStreamSync (if the only backend is cuda)?

Topic		Replies	Views
CUDA stream & VPI stream synchronization Computer Vision & Image Processing cuda , vpi	9	300	November 22, 2024
Does vpiStreamSync on wrapped CUDA stream synchronize underlyng CUDA stream? Jetson AGX Xavier cuda , vpi	4	959	February 9, 2022
VPI <=> CUDA stream and event interop Jetson AGX Xavier vpi	2	785	October 18, 2021
[CUDA GRAPH][VPI] capturing VPI internal kernels within a CUDA graph CUDA Programming and Performance vpi , graph-analytics-cugraph	5	125	October 7, 2024
Cuda Engine with VPI CUDA Programming and Performance vpi	3	891	June 30, 2022
VPI CUDA interop with managed memory Jetson AGX Xavier cuda , vpi	16	1994	October 18, 2021
VPI efficiency issues Jetson AGX Xavier vpi	2	406	October 12, 2023
Should I call cudaStreamSynchronize before executeV2? TensorRT	3	528	March 25, 2023
Running several streams asynchronously CUDA Programming and Performance	3	585	November 24, 2018
Efficient VPI and NPP interop Jetson Nano vpi	4	1442	October 15, 2021

vpiStreamCreateWrapperCUDA with cuda kernels

Related topics