createExecutionContext takes too long in TensorRT 8.0.3.4

Description

TrtUniquePtr<nvinfer1::IExecutionContext> context(m_engine->createExecutionContext());

This line of code run normally with TensorRT 7.2.3.4 + CUDA 11.1, takes about 2 ms. But it takes 300 ms with TensorRT 8.0.3.4 + CUDA 11.2. Engines in both environments are converted from ONNX passed normally.

Environment

TensorRT Version: 7.2.3.4 + CUDA 11.1; 8.0.3.4 + CUDA 11.2
GPU Type: GTX 2080 TI
Nvidia Driver Version: 470.141.03
CUDA Version: 11
CUDNN Version: 8.1.0 in both environments
Operating System + Version: Ubuntu 18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.9
Baremetal or Container (if container which image + tag):

Hi,
Please refer to the below link for Sample guide.

Refer to the installation steps from the link if in case you are missing on anything

However suggested approach is to use TRT NGC containers to avoid any system dependency related issues.

In order to run python sample, make sure TRT python packages are installed while using NGC container.
/opt/tensorrt/python/python_setup.sh

In case, if you are trying to run custom model, please share your model and script with us, so that we can assist you better.
Thanks!