nvinfer1::IExecutionContext::enqueue not asynchronous even with cudaStreamNonBlocking stream

Under what circumstances would nvinfer1::IExecutionContext::enqueue not be
asynchronous?

We have enqueue followed cudaStreamSynchronize.
Almost all the time was taken by enqueue with with very
little taken by cudaStreamSynchronize.

Hi,

Please refer to below link and examples:


https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#perform_inference_c

Thanks

This is NOT an answer to my question.

I repeat my question here:

Under what circumstances would nvinfer1::IExecutionContext::enqueue not be
asynchronous?

We have enqueue followed cudaStreamSynchronize.
Almost all the time was taken by enqueue with with very
little taken by cudaStreamSynchronize.

The question is why in our case enqueue IS NOT asynchronous?
Is the anything that could make enqueue NOT asynchronous.

It can be due to multiple reason:

  • The enqueue time is almost similar even longer than the time consumption on GPU.
  • If there is sync operation in plugin, the whole network would perform as sync.
  • It also depends on your network structure and some layer’s implementation, and even the CPU status.

Thanks

Many thanks. It appears that we have a plugin that syncs