TensorRT Installation and Running Error on AWS EC2 Deep Learning AMI Instance

skilic · August 31, 2021, 7:57am

Description

Hello,

I have a Deep Learning AMI on the AWS EC2 (Deep Learning AMI (Ubuntu 18.04) Version 48.0).

I need to use TensorRT on this.

I set up the ngc settings (API key), then, I pull the TensorRT container (docker pull [nvcr.io/nvidia/tensorrt:20.11-py3])

After that, when I try to run the docker image by this command,

docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorrt:20.11-py3

I got this error

When I try to run docker image another command, I got different error.

What should I do to solve this problem and running the TensorRT on my EC2 instance?

This is kind of emergency problem, please help me as soon as possible.

Thanks

Environment

TensorRT Version: TensorRT 7.2.1
GPU Type: Tesla K80
Nvidia Driver Version: 450.142.00
CUDA Version: container include NVIDIA CUDA 11.1.0
CUDNN Version: container include NVIDIA cuDNN 8.0.4
Operating System + Version: (Ubuntu 18.04) Version 48.0
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 1.15.5
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

docker pull nvcr.io/nvidia/tensorrt:20.11-py3
docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorrt:20.11-py3
docker run nvcr.io/nvidia/tensorrt:20.11-py3

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

NVES · August 31, 2021, 9:05am

Hi,
Please refer to the installation steps from the below link if in case you are missing on anything

Also, we suggest you to use TRT NGC containers to avoid any system dependency related issues.

Thanks!

skilic · August 31, 2021, 12:05pm

Hi,

I have already used to TRT NGC container and my problem is not solved.

I got this error when I try to run the installed TRT conteiner.

According to this page (Container Release Notes :: NVIDIA Deep Learning TensorRT Documentation),

TRT container includes:
-NVIDIA CUDA 11.1.0
-NVIDIA cuDNN 8.0.4
-NVIDIA NCCL 2.8.2

So, in the beginning, I only need Nvidia driver 455 or later.

Since I am using EC2 deep learning AMI instance, this instance is coming with Nvidia driver. I dont need any pre-installation for TensorRT. Right? or do I need extra installation for this purpose?

Thanks

skilic · September 1, 2021, 8:44pm

@NVES

hi,

any suggestion for this issue? it is kind of urgent for me

thanks.

AakankshaS · September 2, 2021, 5:32am

Hi @skilic ,
In the error mentioned in the screenshot, i see you are using the command as is

docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorrt:xx.xx-py3

However here you need to replace the local_dir:container_dir with your host dir and mount dir resp.
You need to mount the path o your host machine to the container.
Can you please try that and let us know.

Thanks!

skilic · September 2, 2021, 7:40am

Hello, @AakankshaS,

I tried and I failed.

I am working on AWS EC2 Deep learning AMI.

I couldnt find the host dir and mount dir. How can I get these directory?

By the way,

Here is the output of docker images list.

docker_images

And, when I run this command;

docker run --gpus all nvcr.io/nvidia/tensorrt:20.11-py3

I got this TensorRT information output but couldnt run.

.

I think I am missing some easy part but could not find yet.

Could you help me please?

Thanks

AakankshaS · September 2, 2021, 9:58am

Hi @skilic ,
I believe for the error you are getting, you can just point it to any folder on the server as host dir.

sarvesh.saran91 · October 20, 2022, 3:25pm

Hi @AakankshaS , I am working AWS amazon linux Tensorflow deep learning AMI EC2 instance. I am able train model, covert tolite model, convert to ONNX but while trying TensorRT, getting an error as shown in image. It says the Tensorflow is not built with TensorRt but these are already installed and is default. Can you let me know how to solve this issue