Call tensorrt's Builder or Runtime at first time cost much longger time on Docker env than host env

Description

With official ngc tensorrt docker, when use python interface, call tensorrt.Runtime(TRT_LOGGER) or trt.Builder(TRT_LOGGER) first time will cost almost 20 seconds.

And even with c++ interface, call nvinfer1::createInferBuilder function also cost a long time.

On the host machine, the same python function call just cost less than 2 second.

Environment

In Docker environment 20.09
TensorRT Version: 7.1.3
GPU Type: P100
Nvidia Driver Version: 465.24.02
CUDA Version: 11.3
CUDNN Version:
Operating System + Version: Ubuntu18.04 host
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): None
PyTorch Version (if applicable): None
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

python code
import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
runtime = trt.Runtime(TRT_LOGGER)
#builder = trt.Builder(TRT_LOGGER) #or this call

c++ code, edit example add log info
bool SampleINT8::build(DataType dataType)
{

**sample::gLogInfo << "before builder initilze" << std::endl;**
auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(sample::gLogger.getTRTLogger()));
**sample::gLogInfo << "after builder initilze" << std::endl;**

below is log, the cost time is 27 seconds
[06/11/2021-09:25:04] [I] before builder initilze
[06/11/2021-09:25:31] [I] after builder initilze

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @quansm,

We recommend you to please try on latest TensorRT container (tensorrt:21.05-py3) and let us know if you still face the same issue.

Thank you.

Thank you for your reply!

I have tried the nvcr.io/nvidia/tensorrt:21.05-py3, but the problem is still there.

I suspect the problem is related to the my docker environment, because in the local environment, there is no such problem. And the function calls do not involve data or models, so the problem is more likely to be related to the runtime environment of TensorRT.

my docker environment:
nvidia-docker version
NVIDIA Docker: 2.6.0
Client: Docker Engine - Community
Version: 20.10.7
API version: 1.41
Go version: go1.13.15
Git commit: f0df350
Built: Wed Jun 2 11:56:40 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.7
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: b0f5bc3
Built: Wed Jun 2 11:54:48 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.6
GitCommit: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc:
Version: 1.0.0-rc95
GitCommit: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
docker-init:
Version: 0.19.0
GitCommit: de40ad0

Below is log:

nvidia-docker run --rm -it -v /data/data_public/lunglobe_trt:/workspace/lung -e NVIDIA_DRIVER_CAPABILITIES=all nvcr.io/nvidia/tensorrt:21.05-py3

=====================
== NVIDIA TensorRT ==

NVIDIA Release 21.05 (build 22596545)

NVIDIA TensorRT 7.2.3 (c) 2016-2021, NVIDIA CORPORATION. All rights reserved.
Container image (c) 2021, NVIDIA CORPORATION. All rights reserved.

https://developer.nvidia.com/tensorrt

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version run /opt/tensorrt/install_opensource.sh.
To build the open source parsers, plugins, and samples for current top-of-tree on master or a different branch, run /opt/tensorrt/install_opensource.sh -b
See GitHub - NVIDIA/TensorRT: TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators. for more information.

root@f41cb6a3f822:/workspace# ls
NVIDIA_Deep_Learning_Container_License.pdf README.md lung tensorrt
root@f41cb6a3f822:/workspace# cd lung
root@f41cb6a3f822:/workspace/lung# python test_runtime.py
begin create runtime…
runtime creation time:57.3871s
begin create second runtime…
second runtime creation time:0.0001s
begin create builder…
builder creation time:0.1143s
root@f41cb6a3f822:/workspace/lung# python test_runtime.py
begin create runtime…
runtime creation time:25.2602s
begin create second runtime…
second runtime creation time:0.0000s
begin create builder…
builder creation time:0.0006s
root@f41cb6a3f822:/workspace/lung# python test_runtime.py
begin create runtime…
runtime creation time:25.1916s
begin create second runtime…
second runtime creation time:0.0000s
begin create builder…
builder creation time:0.0006s

test_runtime.py (790 Bytes)

Hi @quansm,

We tried running script you’ve shared, we got it executed in ~11 seconds. It did not take long time as you observed. Could you please check is enough memory available on your system and please try running it closing other applications.

Thank you.

Thank you for your reply!

I’m new to using tensorrt, so I don’t know how long the called function should be. If you think 11 second is ok, then ~20 second my machine may be acceptable. It’s not a problem.

Actually I have runned the test on several machines, but the environment may not very consistently, so I don’t mention it. But in one of the machines that also installed tensorrt on host conda python envirionment, I runned the test and found the calling function returned very fast. Below is the log:

(py36) xxx@XXX:~/TOOLS$ python test_runtime.py
tensorrt version: 7.0.0.11
begin create runtime…
runtime creation time:0.2672s
runtime: <tensorrt.tensorrt.Runtime object at 0x7f40bd1c81f0>
begin create second runtime…
second runtime creation time:0.0000s
begin create builder…
builder creation time:0.0002

The machine above is two Geforce 2080 GPU and the function called on docker environment is also ~11 second. But why in the host environment the calling function is very fast?

Hi @quansm,

This could be due to the compiler cache also - ~/.nv/ComputeCache, it wouldn’t be persistent across different container launches, but would be on baremetal.

You can try on latest NGC container with TensorRT v8.0, we resolved similar issues in latest version. Hope this issue is not a blocker for you.

Thank you.

Hi @spolisetty ,

Thank you for your replay!

This problem is not a big deal for a continuous running service, so I can ignore it in my case.

By the way, I have checked ~/.nv/ComputeCache on host, there is nothing in it. When I delete the directory and run the script again, the directory is created, but it’s empty. In the previous tests on the docker envritonment, when call the script more than one times, the cost time dose not change a lot.

I will use try Tensorrt v8.0 sometime later.
Thank you again!

When test on docker with tensorrt 8.0 EA, the function call is very fast, so update to tensorrt 8.0 may be a solution for this problem on my case.

below is test log :

root@3dc504353c12:/workspace/lung# python test_runtime.py
tensorrt version 8.0.0.3
begin create runtime…
runtime creation time:1.9336s
begin create second runtime…
second runtime creation time:0.0001s