TensorRT v21.12-py3 Docker image cannot work with GPU option on ARM (AGX) device

Description

I found the TensorRT docker image on NGC for v21.12-py3 which can support for 2 platforms (amd and arm). So I was trying to pull it on my AGX device.

If I docker run with gpus, then it will get failure.

nvidia@AGX-00044bcc0f04:~$ docker run --gpus all -it nvcr.io/nvidia/tensorrt:21.12-py3 bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --video --require=cuda>=9.0 --pid=29265 /var/lib/docker/overlay2/d3135262cb32659066b189443a9169ff227807300914a0bcee15be2cc2c0d6dc/merged]
nvidia-container-cli: mount error: mount operation failed: /usr/src/tensorrt: no such file or directory: unknown.
ERRO[0004] error waiting for container: context canceled 

If I docker run without gpus, then it can access inside the container. (But obviously it cannot use TensorRT without GPU.)

nvidia@AGX-00044bcc0f04:~$ docker run -it nvcr.io/nvidia/tensorrt:21.12-py3 bash

=====================
== NVIDIA TensorRT ==
=====================

NVIDIA Release 21.12 (build 29870938)
NVIDIA TensorRT Version 8.2.1
Copyright (c) 2016-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Container image Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

https://developer.nvidia.com/tensorrt

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh.  To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b <branch>
See https://github.com/NVIDIA/TensorRT for more information.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

Is there any information about it?

Environment

  • Device : AGX
  • JetPack v4.6

Thank you so much!!

BR,
Chieh

1 Like

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

hi @NVES

I think my question is not about onnx models or any codes, right?

Hi,

Are you still facing this issue ?

hi @spolisetty,

Yeah! The information of Docker image on NGX shows supporting ARM64 arch, but the container cannot work with GPU option.

@Chieh, I believe there might be some problem with your nvidia-docker setup. I tried your same command on my desktop and it worked as expected:

$ docker run --gpus all -it nvcr.io/nvidia/tensorrt:21.12-py3 bash

=====================
== NVIDIA TensorRT ==
=====================

NVIDIA Release 21.12 (build 29870938)
NVIDIA TensorRT Version 8.2.1
Copyright (c) 2016-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Container image Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

https://developer.nvidia.com/tensorrt

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh.  To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b <branch>
See https://github.com/NVIDIA/TensorRT for more information.

root@b483a9ff22ba:/workspace# 

What leads me to believe this might be an nvidia-docker set up issue is the following line:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --video --require=cuda>=9.0 --pid=29265 /var/lib/docker/overlay2/d3135262cb32659066b189443a9169ff227807300914a0bcee15be2cc2c0d6dc/merged]

Can you try running a docker command like the following to check your GPU+docker setup?

$ docker run --rm --gpus all nvidia/cuda:11.0-runtime-ubuntu20.04 nvidia-smi

hi @ework

Thanks for your reply!

  1. Could I know what JetPack version did you install on your jetson device?
  2. In my Jetson device, it is working well when I use other images (NVIDIA L4T series) which are particularly for the Jetson ARM devices. The problem will happen only after I apply this TensorRT v21.12-py3 image.
  3. As far as I know, the nvidia/cuda:11.0-runtime-ubuntu20.04 image and the command nvidia-smi are for AMD device use. It cannot work on Jetson (ARM) devices, CMIIW. I have mentioned my testing environment in my post. Mainly, I am curious whether the jetson devices can truly use the image which is claimed for ARM and AMD architectures from NGC. (I highlighted it in my picture.)

Eventually, I still tested your command, I think it cannot work definitely tho.

Here is the output:

nvidia@AGX-00044bcc0f04:~$ docker run --rm --gpus all nvidia/cuda:11.0-runtime-ubuntu20.04 nvidia-smi
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "nvidia-smi": executable file not found in $PATH: unknown.
ERRO[0001] error waiting for container: context canceled 

Thank you.

@Chieh, sorry I didn’t see the part about Jetson AGX. I don’t have easy access to a Jetson AGX device to check so I won’t be able to try and reproduce your issue. I will try my best from the output you provide. Can you check if you have TensorRT installed on the device directly? /usr/src/tensorrt in particular comes from the libnvinfer-samples package. These Jetson containers are built in a strange way (not sure why), but they mount many files from outside the container into the container. I believe it’s to save space.

Hi @ework,

No worries!! Thank you again!

Yes, you are right. As you mentioned, the containers mount some files such as Cuda from the host into the container.
My host env works well including TensorRT after I install the JetPack completely. Everything is good on my Jetson device except implementing that TensorRT image.

In this post, I just wanna double-check whether it is my fault to use that image or something that I get miss-understanding about that ARM device meaning on the TensorRT container area from NGC.

@Chieh this is a common customer complaint that the containers even though they are from NGC don’t work the same as x86 containers. We have suggested that they be normal standalone containers so the usage becomes easier even if they take more space. It’s not obvious that host packages are needed to run the container and it’s easy to miss that step in the process. I’m glad to hear you have it working now.

Not really.
Everything is working well except using that image (nvcr.io/nvidia/tensorrt:21.12-py3) on AGX currently.

@Chieh Hello, did you find any solution for that problem?
I’m in the middle of exactly the same with the " nvcr.io/nvidia/pytorch:21.12-py3" image on the same jetson AGX 4.6.
Actually, i’m trying to run the Nemo project - GitHub - NVIDIA/NeMo: NeMo: a toolkit for conversational AI and wonder if you have any insight of maybe using another Image that will able me to run the Nemo on AGX.

@user10

Of course, not yet.
So I wanna confirm the “ARM” whether it is really for jetson series use or not.
FYI, if you just wanna use the docker image with PyTorch in Jetson device, you can try this one.

Does your host AGX have /usr/src/tensorrt directory ?
In my case, when I moved this directory to another name

/usr/src$ sudo mv tensorrt tensorrt_org

I could run docker image (nvcr.io/nvidia/tensorflow:23.02-tf1-py3)