TensorRT L4T docker image Python version Issue

In the TensorRT L4T docker image, the default python version is 3.8, but apt aliases like python3-dev install 3.6 versions (so package building is broken) and any python-foo packages aren’t found by python. For some packages like python-opencv building from sources takes prohibitively long on Tegra, so software that relies on it and TensorRT can’t work, at least with the default python3 version.

example:

root@ab4490a9c568:/app# apt-get install python3-opencv
Reading package lists... Done
Building dependency tree       
Reading state information... Done
python3-opencv is already the newest version (3.2.0+dfsg-4ubuntu0.1).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
root@ab4490a9c568:/app# python3
Python 3.8.0 (default, Feb 25 2021, 22:10:10) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'cv2'

Also, looks like a manual symlink was made from python3.8 to python3 instead of using update-alternatives. You can set it like this instead (for 3.6):

# update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 2
update-alternatives: using /usr/bin/python3.6 to provide /usr/bin/python3 (python3) in auto mode

That way the package manager knows what the default Python version is. Still won’t provide missing python-foo packages, but it’s a start.

It would be nice if TensorRT images worked the same on Tegra and x86. Unfortunately this can’t be relied on. Every time I have to port a Dockerfile to Tegra it’s at least a day of if tegra... workarounds leading to build scripts and --build-arg=blabla when ideally the same Dockerfile should just work on any platform, as is the case with images derived from ubuntu:latest. It would be really nice to just docker-compose up and have everything just work as there is nothing technically prohibiting this.

Hi ,
We recommend you to check the supported features from the below link.
https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html
You can refer below link for all the supported operators list.
For unsupported operators, you need to create a custom plugin to support the operation

Thanks!

Thanks for your reply,

I apologize for being unclear. This is about a misconfigured python setup in your L4T TensorRT base image, unfortunately. It has nothing to do with the model. The model we are using works fine outside the container on the same version of TensorRT.

The issue is our software uses OpenCV for some preprocessing (not my decision) and that isn’t easily installable with the default version of python used (which does not match L4T.

I appreciate the move towards fully containerized solutions for Tegra, but your base image has too may differences from the x86 version to make it useful.

A Dockerfile written for x86 shout “just work” on Tegra as if does with “ubuntu:latest” or alpine or any other number of base images that manage multiple architectures (with a manifest). The base image should be the same. The package set, repo’s, should all be the same. If they’re not, it should be considered broken.

I do this regularly with non-nvidia docker images. FROM ubuntu:latest ... and it works on everything. When I am forced to work with Nvidia base images on Tegra on the other hand I spend more time working around breakage like this than actually developing software.

That’s not an exaggeration. It seem as if there is no quality control at all on this. You should have a series of Dockerfiles you test against both arm64 and amd64 base images. It’s clear that’s not done and it’s very very frustrating. On x86, your images are a pleasure. On arm very much the opposite.

Hi,

Have you tried TensorRT NGC containers NVIDIA NGC if it serves your purpose ?

Thank you.

I am referring to the NGC image. Sorry I wasn’t clear on this:

nvcr.io/nvidia/l4t-tensorrt:r8.0.1-runtime

Hi,

This looks like out of scope for TensorRT, may be following link will be helpful to you python - " No module named 'cv2' " but it is installed - Stack Overflow

You can also try using TensorRT NGC container, if it serves your purpose.
https://ngc.nvidia.com/containers/nvidia:tensorrt

Thank you.

It looks like I’m not being clear enough. the problem is with your TensorRT NGC image, nvcr.io/nvidia/l4t-tensorrt:r8.0.1-runtime

The problem is, again, there are two python3 installs in that image. Python packages installed through apt-get do not always work with this setup. Yes, you can pip install them but there are supply chain issues there and the packages are not tested against other system packages. Further, if a wheel isn’t found, it can take hours to build packages on Tegra.

Your x86 image uses Ubuntu 20.04 as a base. If you did the same for Tegra it would solve the issue entirely since it’s Python is 3.8 but the Tegra image is Ubuntu 18.04 based (python 3.6). If this somehow isn’t clear, I apologize (and give up, since it seems i’m not getting through).

@mdegans, are you saying that the L4T container has python 3.8 installed even though the base OS (Ubuntu 18.04) is using python 3.6? On the bare-metal (outside the container) TensorRT Debian packages only support python 3.6 on JetPack since it’s the default system python. See Support Matrix :: NVIDIA Deep Learning TensorRT Documentation

Yes. Exactly that. Some python3-foo package will still work depending on where their installer places them (see sys.path), but not all:

 $ sudo docker run -it --rm nvcr.io/nvidia/l4t-tensorrt:r8.0.1-runtime
root@888659f49bd5:/# python3
Python 3.8.0 (default, Feb 25 2021, 22:10:10) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/usr/local/lib/python3.8/dist-packages', '/usr/lib/python3/dist-packages']
>>> exit()
root@888659f49bd5:/# cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"

Again, apologies for not being clearer before. The image is ubuntu:bionic based apparently (like l4t) but has python 3.8. That’s doable, but it breaks some python packages distributed through apt-get. I assumed this intentional for compat with the x86 image which is based off ubuntu:focal. I guess not.

I was originally going to say it was not intentional, but I think the build system is common between x86 and aarch64. This might have motivated the choice for using python 3.8. If it’s like the x86 container then the python bindings for TensorRT and other libraries are in /usr/local/... and installed using a whl file rather than Debian packages. As you stated this means you will not be able to install packages using apt and have them work in the default python3 environment. You’ll need to use pip or other methods to install what you need.

Issue is not all packages have pip wheels on arm64 and building them can take a very very long time. Can you use a ubuntu:focal as a base image on Tegra instead?

Also, I tend to trust packages from Canonical more than pypi given supply chain issues. The former are signed with Canonical’s gpg key. The latter I’m one typo away from installing malware.

Kind of doubt that. There are a bunch more differences between the arm64 and amd64 images. A test suite of common Dockerfiles might help.

I guess the only option for now would be to install the Debian packages for TensorRT yourself into the container, which will be for 3.6. I’ll see if I can figure out why they did this.

Thanks. Appreciate it. Ideally, i’d like to be able to use the same Dockerfile for x86 and tegra, with a build script just supplying the base image as an argument a bit like this:

ARG BASE_IMAGE

FROM ${BASE_IMAGE}
...

Unfortunately i’m finding it never goes smoothly since there are too many difference between the x86 and aarch64 images. Same Dockerfile there’s a line like this, for example:

RUN if [ -f "/opt/tensorrt/install_opensource.sh" ] ; then /opt/tensorrt/install_opensource.sh ; fi

That’s kind of ugly. And this stuff ends up taking up half the Dockerfile. I would greatly appreciate it if you had a test suite to check these kinds of things.

Yes, but that can’t be automated because the downloads are behind a login wall. I could COPY it into the image, but that would increase the image size since docker layers are COW. Also, a bunch of nvidia l4t packages refuse to install on a non-l4t-base rootfs. I don’t have the time to tear apart a bunch of debian packages to find what preinst script is breaking stuff.

Likewise l4t-base has no nvidia apt sources enabled so a apt-get install tensorrt is out of the question. It’s possible to add the apt sources, but again, it’s a ton of ugliness and hacking and it shouldn’t be necessary because that image should have the sources out of the box (not relying on bind mounting all of the things). Please, test both x86 and aarch64 images against common Dockerfiles and consider any differences breakage.

@mdegans, I’ve found out who maintains this container and been discussing the issue with them. Hopefully we can take a look at this python incompatibility in the next release.

Thanks. If there’s one bit of feedback I’d love to give your nvidia-docker on Tegra team is that the bind mounting approach should be scrapped. It means you need different base image Dockerfiles for each architecture, and for each JP release, and this divergence leads to countless bugs. Heck, it even breaks “docker build” unless you make the nvidia runtime the default.

If I pull ubuntu:latest from docker hub, I get the same thing on x86 and aarch64. My Dockerfiles that are FROM ubuntu:latest will “just work”. That means I write less platform specific code which is really, really nice. As soon as that breaks, so does everything downstream.

Yes, it means the base image will be larger and you won’t be able to “cheat” by bind mounting all of the things at runtime, but this could be solved by releasing a “l4t-slim” or “core” image dedicated to just running containers. It’d be more repeatable. It’d be more reliable. The system attack surface would be minimized, and you’d avoid privilege escalation CVEs like this – a direct result of using this bind mounting approach.

This has been a pain point for our team as well because we have to inform them which directories/files to mount for our developed libraries. There are plans to move away from this approach to just plain stand-alone containers, but not sure what the timeline is for that.