My mistake, I was actually not synchronizing VPI within the stream capture.
- If I don’t synchronize, my dummy example finishes with an empty graph.
- If I synchronize my VPI stream before cudaStreamEndCapture(), I get the following error: VPI_ERROR_INTERNAL: (cudaErrorStreamCaptureUnsupported)
I tried several VPI calls and get the same error. My guess is that the whole VPI library don’t support CUDA graphs capture. Is there any plan to make VPI supporting Graph capture?
This could be a useful workaround for the current limitations of VPI when using a common stream for regular CUDA kernels and VPI calls. From VPI library:
CUDA kernels can only be submitted directly to cudaStream_t if it’s guaranteed that all tasks submitted to VPIStream are finished.
This issue is further explained here:
These limitations are quite challenging as they impose many synchronization on complex pipelines, hindering performance of my application.
Best.