Use result of inference before inference completly finished

simon.eichhammer · May 5, 2021, 9:36am

Hi,

i’m using a jetson xavier agx with Jetpack 4.3.
I generated an engine for tensorrt and uses the C++ API.
For inference of the engine i use the asynchronous method enqueue().

Is it possible to use a part of the inferenceresult in another stream before the inference completly finished?

Thanks in advance.

dusty_nv · May 5, 2021, 1:38pm

Hi @simon.eichhammer, I believe you can do this by recording a cudaEvent to stream A, and then asynchronously waiting on that cudaEvent from stream B. This could be the rough flow:

tensorrt enqueue() on stream A
cudaEventRecord() on stream A
cudaStreamWaitEvent() on stream B
enqueue CUDA kernels on stream B

Here are the relevant APIs: