TensorRT v21.12-py3 Docker image cannot work with GPU option on ARM (AGX) device

Chieh · December 24, 2021, 9:42am

Description

I found the TensorRT docker image on NGC for v21.12-py3 which can support for 2 platforms (amd and arm). So I was trying to pull it on my AGX device.

If I docker run with gpus, then it will get failure.

nvidia@AGX-00044bcc0f04:~$ docker run --gpus all -it nvcr.io/nvidia/tensorrt:21.12-py3 bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --video --require=cuda>=9.0 --pid=29265 /var/lib/docker/overlay2/d3135262cb32659066b189443a9169ff227807300914a0bcee15be2cc2c0d6dc/merged]
nvidia-container-cli: mount error: mount operation failed: /usr/src/tensorrt: no such file or directory: unknown.
ERRO[0004] error waiting for container: context canceled

If I docker run without gpus, then it can access inside the container. (But obviously it cannot use TensorRT without GPU.)

nvidia@AGX-00044bcc0f04:~$ docker run -it nvcr.io/nvidia/tensorrt:21.12-py3 bash

=====================
== NVIDIA TensorRT ==
=====================

NVIDIA Release 21.12 (build 29870938)
NVIDIA TensorRT Version 8.2.1
Copyright (c) 2016-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Container image Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

https://developer.nvidia.com/tensorrt

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh.  To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b <branch>
See https://github.com/NVIDIA/TensorRT for more information.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

Is there any information about it?

Environment

Device : AGX
JetPack v4.6

Thank you so much!!

BR,
Chieh

NVES · December 24, 2021, 10:08am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Chieh · December 24, 2021, 11:14am

hi @NVES

I think my question is not about onnx models or any codes, right?

spolisetty · January 6, 2022, 3:47pm

Hi,

Are you still facing this issue ?

Chieh · January 7, 2022, 12:48am

hi @spolisetty,

Yeah! The information of Docker image on NGX shows supporting ARM64 arch, but the container cannot work with GPU option.

ework · January 14, 2022, 12:38am

@Chieh, I believe there might be some problem with your nvidia-docker setup. I tried your same command on my desktop and it worked as expected:

$ docker run --gpus all -it nvcr.io/nvidia/tensorrt:21.12-py3 bash

=====================
== NVIDIA TensorRT ==
=====================

NVIDIA Release 21.12 (build 29870938)
NVIDIA TensorRT Version 8.2.1
Copyright (c) 2016-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Container image Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

https://developer.nvidia.com/tensorrt

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh.  To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b <branch>
See https://github.com/NVIDIA/TensorRT for more information.

root@b483a9ff22ba:/workspace#

What leads me to believe this might be an nvidia-docker set up issue is the following line:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --video --require=cuda>=9.0 --pid=29265 /var/lib/docker/overlay2/d3135262cb32659066b189443a9169ff227807300914a0bcee15be2cc2c0d6dc/merged]

Can you try running a docker command like the following to check your GPU+docker setup?

$ docker run --rm --gpus all nvidia/cuda:11.0-runtime-ubuntu20.04 nvidia-smi

Chieh · January 14, 2022, 1:02am

hi @ework

Thanks for your reply!

Could I know what JetPack version did you install on your jetson device?
In my Jetson device, it is working well when I use other images (NVIDIA L4T series) which are particularly for the Jetson ARM devices. The problem will happen only after I apply this TensorRT v21.12-py3 image.
As far as I know, the nvidia/cuda:11.0-runtime-ubuntu20.04 image and the command nvidia-smi are for AMD device use. It cannot work on Jetson (ARM) devices, CMIIW. I have mentioned my testing environment in my post. Mainly, I am curious whether the jetson devices can truly use the image which is claimed for ARM and AMD architectures from NGC. (I highlighted it in my picture.)

Eventually, I still tested your command, I think it cannot work definitely tho.

Here is the output:

nvidia@AGX-00044bcc0f04:~$ docker run --rm --gpus all nvidia/cuda:11.0-runtime-ubuntu20.04 nvidia-smi
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "nvidia-smi": executable file not found in $PATH: unknown.
ERRO[0001] error waiting for container: context canceled

Thank you.

ework · January 14, 2022, 1:19am

@Chieh, sorry I didn’t see the part about Jetson AGX. I don’t have easy access to a Jetson AGX device to check so I won’t be able to try and reproduce your issue. I will try my best from the output you provide. Can you check if you have TensorRT installed on the device directly? /usr/src/tensorrt in particular comes from the libnvinfer-samples package. These Jetson containers are built in a strange way (not sure why), but they mount many files from outside the container into the container. I believe it’s to save space.

Chieh · January 14, 2022, 1:40am

Hi @ework,

No worries!! Thank you again!

Yes, you are right. As you mentioned, the containers mount some files such as Cuda from the host into the container.
My host env works well including TensorRT after I install the JetPack completely. Everything is good on my Jetson device except implementing that TensorRT image.

In this post, I just wanna double-check whether it is my fault to use that image or something that I get miss-understanding about that ARM device meaning on the TensorRT container area from NGC.

ework · January 14, 2022, 1:45am

@Chieh this is a common customer complaint that the containers even though they are from NGC don’t work the same as x86 containers. We have suggested that they be normal standalone containers so the usage becomes easier even if they take more space. It’s not obvious that host packages are needed to run the container and it’s easy to miss that step in the process. I’m glad to hear you have it working now.

Chieh · January 14, 2022, 9:18am

Not really.
Everything is working well except using that image (nvcr.io/nvidia/tensorrt:21.12-py3) on AGX currently.

user107298 · January 19, 2022, 9:55am

@Chieh Hello, did you find any solution for that problem?
I’m in the middle of exactly the same with the " nvcr.io/nvidia/pytorch:21.12-py3" image on the same jetson AGX 4.6.
Actually, i’m trying to run the Nemo project - GitHub - NVIDIA/NeMo: NeMo: a toolkit for conversational AI and wonder if you have any insight of maybe using another Image that will able me to run the Nemo on AGX.

Chieh · January 20, 2022, 1:56am

@user10

Of course, not yet.
So I wanna confirm the “ARM” whether it is really for jetson series use or not.
FYI, if you just wanna use the docker image with PyTorch in Jetson device, you can try this one.

user31244 · February 18, 2024, 2:55am

Does your host AGX have /usr/src/tensorrt directory ?
In my case, when I moved this directory to another name

/usr/src$ sudo mv tensorrt tensorrt_org

I could run docker image (nvcr.io/nvidia/tensorflow:23.02-tf1-py3)

Topic		Replies	Views
Docker issue Jetson AGX Xavier tensorrt , docker	5	898	October 18, 2021
Could not build docker image in DL4AGX DRIVE AGX Xavier General tensorrt	6	740	June 7, 2020
Error running TensorRT TensorRT	3	1373	October 12, 2021
TensorRT L4T docker image Python version Issue TensorRT	21	2665	January 28, 2022
Nvcr.io/nvidia/l4t-ml:r32.4.3-py3 images NOT have some tensorrt's files Jetson AGX Xavier tensorrt	2	339	December 21, 2022
TensorRT + Jetson NX + Docker: exec user process caused "exec format error" Jetson Xavier NX docker	5	1622	October 18, 2021
Install Manually TensorRT on AGX XAVIER Jetson AGX Xavier tensorrt	5	1599	September 27, 2021
I want to run a gpu tensorRT container, but my jetson orin dosn't work CUDA Setup and Installation tensorrt , nvidia-smi	1	967	September 15, 2023
Unable to use TensorRT inside the L4T-Tensorflow container Jetson Xavier NX tensorrt	9	1956	October 18, 2021
TensorRT Docker to run Pytorch/Tensorflow/Onnx model on NVIDIA Quadro M2200 series Docker and NVIDIA Docker tensorrt	0	561	May 23, 2022

TensorRT v21.12-py3 Docker image cannot work with GPU option on ARM (AGX) device

Description

Environment

check_model.py

Related topics