TensorRT L4T docker image Python version Issue

Thanks for your reply,

I apologize for being unclear. This is about a misconfigured python setup in your L4T TensorRT base image, unfortunately. It has nothing to do with the model. The model we are using works fine outside the container on the same version of TensorRT.

The issue is our software uses OpenCV for some preprocessing (not my decision) and that isn’t easily installable with the default version of python used (which does not match L4T.

I appreciate the move towards fully containerized solutions for Tegra, but your base image has too may differences from the x86 version to make it useful.

A Dockerfile written for x86 shout “just work” on Tegra as if does with “ubuntu:latest” or alpine or any other number of base images that manage multiple architectures (with a manifest). The base image should be the same. The package set, repo’s, should all be the same. If they’re not, it should be considered broken.

I do this regularly with non-nvidia docker images. FROM ubuntu:latest ... and it works on everything. When I am forced to work with Nvidia base images on Tegra on the other hand I spend more time working around breakage like this than actually developing software.

That’s not an exaggeration. It seem as if there is no quality control at all on this. You should have a series of Dockerfiles you test against both arm64 and amd64 base images. It’s clear that’s not done and it’s very very frustrating. On x86, your images are a pleasure. On arm very much the opposite.

1 Like

Hi,

Have you tried TensorRT NGC containers NVIDIA NGC if it serves your purpose ?

Thank you.

I am referring to the NGC image. Sorry I wasn’t clear on this:

nvcr.io/nvidia/l4t-tensorrt:r8.0.1-runtime

Hi,

This looks like out of scope for TensorRT, may be following link will be helpful to you python - " No module named 'cv2' " but it is installed - Stack Overflow

You can also try using TensorRT NGC container, if it serves your purpose.
https://ngc.nvidia.com/containers/nvidia:tensorrt

Thank you.

It looks like I’m not being clear enough. the problem is with your TensorRT NGC image, nvcr.io/nvidia/l4t-tensorrt:r8.0.1-runtime

The problem is, again, there are two python3 installs in that image. Python packages installed through apt-get do not always work with this setup. Yes, you can pip install them but there are supply chain issues there and the packages are not tested against other system packages. Further, if a wheel isn’t found, it can take hours to build packages on Tegra.

Your x86 image uses Ubuntu 20.04 as a base. If you did the same for Tegra it would solve the issue entirely since it’s Python is 3.8 but the Tegra image is Ubuntu 18.04 based (python 3.6). If this somehow isn’t clear, I apologize (and give up, since it seems i’m not getting through).

@mdegans, are you saying that the L4T container has python 3.8 installed even though the base OS (Ubuntu 18.04) is using python 3.6? On the bare-metal (outside the container) TensorRT Debian packages only support python 3.6 on JetPack since it’s the default system python. See Support Matrix :: NVIDIA Deep Learning TensorRT Documentation

Yes. Exactly that. Some python3-foo package will still work depending on where their installer places them (see sys.path), but not all:

 $ sudo docker run -it --rm nvcr.io/nvidia/l4t-tensorrt:r8.0.1-runtime
root@888659f49bd5:/# python3
Python 3.8.0 (default, Feb 25 2021, 22:10:10) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/usr/local/lib/python3.8/dist-packages', '/usr/lib/python3/dist-packages']
>>> exit()
root@888659f49bd5:/# cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"

Again, apologies for not being clearer before. The image is ubuntu:bionic based apparently (like l4t) but has python 3.8. That’s doable, but it breaks some python packages distributed through apt-get. I assumed this intentional for compat with the x86 image which is based off ubuntu:focal. I guess not.

I was originally going to say it was not intentional, but I think the build system is common between x86 and aarch64. This might have motivated the choice for using python 3.8. If it’s like the x86 container then the python bindings for TensorRT and other libraries are in /usr/local/... and installed using a whl file rather than Debian packages. As you stated this means you will not be able to install packages using apt and have them work in the default python3 environment. You’ll need to use pip or other methods to install what you need.

Issue is not all packages have pip wheels on arm64 and building them can take a very very long time. Can you use a ubuntu:focal as a base image on Tegra instead?

Also, I tend to trust packages from Canonical more than pypi given supply chain issues. The former are signed with Canonical’s gpg key. The latter I’m one typo away from installing malware.

Kind of doubt that. There are a bunch more differences between the arm64 and amd64 images. A test suite of common Dockerfiles might help.

I guess the only option for now would be to install the Debian packages for TensorRT yourself into the container, which will be for 3.6. I’ll see if I can figure out why they did this.

Thanks. Appreciate it. Ideally, i’d like to be able to use the same Dockerfile for x86 and tegra, with a build script just supplying the base image as an argument a bit like this:

ARG BASE_IMAGE

FROM ${BASE_IMAGE}
...

Unfortunately i’m finding it never goes smoothly since there are too many difference between the x86 and aarch64 images. Same Dockerfile there’s a line like this, for example:

RUN if [ -f "/opt/tensorrt/install_opensource.sh" ] ; then /opt/tensorrt/install_opensource.sh ; fi

That’s kind of ugly. And this stuff ends up taking up half the Dockerfile. I would greatly appreciate it if you had a test suite to check these kinds of things.

Yes, but that can’t be automated because the downloads are behind a login wall. I could COPY it into the image, but that would increase the image size since docker layers are COW. Also, a bunch of nvidia l4t packages refuse to install on a non-l4t-base rootfs. I don’t have the time to tear apart a bunch of debian packages to find what preinst script is breaking stuff.

Likewise l4t-base has no nvidia apt sources enabled so a apt-get install tensorrt is out of the question. It’s possible to add the apt sources, but again, it’s a ton of ugliness and hacking and it shouldn’t be necessary because that image should have the sources out of the box (not relying on bind mounting all of the things). Please, test both x86 and aarch64 images against common Dockerfiles and consider any differences breakage.

@mdegans, I’ve found out who maintains this container and been discussing the issue with them. Hopefully we can take a look at this python incompatibility in the next release.

1 Like

Thanks. If there’s one bit of feedback I’d love to give your nvidia-docker on Tegra team is that the bind mounting approach should be scrapped. It means you need different base image Dockerfiles for each architecture, and for each JP release, and this divergence leads to countless bugs. Heck, it even breaks “docker build” unless you make the nvidia runtime the default.

If I pull ubuntu:latest from docker hub, I get the same thing on x86 and aarch64. My Dockerfiles that are FROM ubuntu:latest will “just work”. That means I write less platform specific code which is really, really nice. As soon as that breaks, so does everything downstream.

Yes, it means the base image will be larger and you won’t be able to “cheat” by bind mounting all of the things at runtime, but this could be solved by releasing a “l4t-slim” or “core” image dedicated to just running containers. It’d be more repeatable. It’d be more reliable. The system attack surface would be minimized, and you’d avoid privilege escalation CVEs like this – a direct result of using this bind mounting approach.

This has been a pain point for our team as well because we have to inform them which directories/files to mount for our developed libraries. There are plans to move away from this approach to just plain stand-alone containers, but not sure what the timeline is for that.

Hey @ework, did your discussions with the maintainer come of anything?

:)

Yes, this task has been picked up for development, but not sure when it would be completed. Unfortunately (depending on how you see it) an L4T release based on Ubuntu 20.04, which would also solve this issue, is more likely to occur first.

Thanks for your reply @ework, I had 2 further questions:

Q1: The outdated Dockerfile’s provided on nvidia/container-images/l4t-base are quite simple, I genuinely wonder if there’s more to it than just that Dockerfile- why does it take so long - planning since early 2021?

Q2: May I also ask why this repo is not up to date. According to the L4T base NGC page, this seems like a commitment to open source the Dockerfiles - see below quote. Could you please pass feedback to the team, that this repo hasn’t been updated.

Starting with the r32.4.3 release, the Dockerfile for the l4t-base docker image is also being provided. This can be accessed at this link. Users can use this to modify the contents to suit their need.

Thank you :)

This might be an answer to both of your questions. To be honest I don’t know why that repo is so outdated or why what should be as simple as updating a Dockerfile is not being worked on sooner. There are QA testing requirements and it’s possible some of the dependent Python modules don’t work as expected. I know we’re planning to move away from the approach where libraries from the host are overlaid into the container to instead using standalone containers. People working in this areas are also quite busy right now with Orin related work. I’m more familiar with the packaging and distribution of our deep learning libraries than the Jetpack distribution.

1 Like