Torch2trt fail when building container from jetson-containers

platform: Jetson Orin AGX
l4T: 36.4.0
Jetpack: 6.1
CUDA:12.6.68

Hello, Im trying to build a docker container using jetson-container and i keep getting an error when building a custom container.

when i try to build using jetson-containers build --name=amigo_detect/ ros:humble-ros-base nanoowl, The ros part builds but when building the nanoowl part i get this error:

-- Building container amigo_detect/nanoowl:r36.4.0-torch2trt

DOCKER_BUILDKIT=0 docker build --network=host --tag amigo_detect/nanoowl:r36.4.0-torch2trt \
--file /home/castej-jetson/jetson-containers/packages/pytorch/torch2trt/Dockerfile \
--build-arg BASE_IMAGE=amigo_detect/nanoowl:r36.4.0-torchvision \
/home/castej-jetson/jetson-containers/packages/pytorch/torch2trt \
2>&1 | tee /home/castej-jetson/jetson-containers/logs/20250120_183217/build/amigo_detect_nanoowl_r36.4.0-torch2trt.txt; exit ${PIPESTATUS[0]}

DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0
            environment-variable.

Sending build context to Docker daemon  15.36kB
Step 1/6 : ARG BASE_IMAGE
Step 2/6 : FROM ${BASE_IMAGE}
 ---> 882d0e2cf731
Step 3/6 : ADD https://api.github.com/repos/NVIDIA-AI-IOT/torch2trt/git/refs/heads/master /tmp/torch2trt_version.json

 ---> 433fbe242b68
Step 4/6 : COPY patches/ /tmp/patches/
 ---> 6dd6285cdde7
Step 5/6 : RUN cd /opt &&     git clone --depth=1 https://github.com/NVIDIA-AI-IOT/torch2trt &&     cd torch2trt &&     cp /tmp/patches/flattener.py torch2trt &&     pip3 install --verbose . &&     sed 's|^set(CUDA_ARCHITECTURES.*|#|g' -i CMakeLists.txt &&     sed 's|Catch2_FOUND|False|g' -i CMakeLists.txt &&     cmake -B build -DCUDA_ARCHITECTURES=${CUDA_ARCHITECTURES} . &&     cmake --build build --target install &&     ldconfig &&     pip3 install --no-cache-dir --verbose nvidia-pyindex &&     pip3 install --no-cache-dir --verbose onnx-graphsurgeon
 ---> Running in f0f57afeed4b
Cloning into 'torch2trt'...
Using pip 24.3.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Looking in indexes: https://pypi.jetson-ai-lab.dev/jp6/cu126
Processing /opt/torch2trt
  Preparing metadata (setup.py): started
  Running command python setup.py egg_info
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/opt/torch2trt/setup.py", line 2, in <module>
      import tensorrt
    File "/usr/local/lib/python3.10/dist-packages/tensorrt/__init__.py", line 75, in <module>
      from .tensorrt import *
  ImportError: libnvdla_compiler.so: cannot open shared object file: No such file or directory
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /usr/bin/python3.10 -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize
  
  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/opt/torch2trt/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' egg_info --egg-base /tmp/pip-pip-egg-info-0dvngrhj
  cwd: /opt/torch2trt/
  Preparing metadata (setup.py): finished with status 'error'
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
The command '/bin/bash -c cd /opt &&     git clone --depth=1 https://github.com/NVIDIA-AI-IOT/torch2trt &&     cd torch2trt &&     cp /tmp/patches/flattener.py torch2trt &&     pip3 install --verbose . &&     sed 's|^set(CUDA_ARCHITECTURES.*|#|g' -i CMakeLists.txt &&     sed 's|Catch2_FOUND|False|g' -i CMakeLists.txt &&     cmake -B build -DCUDA_ARCHITECTURES=${CUDA_ARCHITECTURES} . &&     cmake --build build --target install &&     ldconfig &&     pip3 install --no-cache-dir --verbose nvidia-pyindex &&     pip3 install --no-cache-dir --verbose onnx-graphsurgeon' returned a non-zero code: 1
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/castej-jetson/jetson-containers/jetson_containers/build.py", line 112, in <module>
    build_container(args.name, args.packages, args.base, args.build_flags, args.build_args, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api, args.skip_packages)
  File "/home/castej-jetson/jetson-containers/jetson_containers/container.py", line 147, in build_container
    status = subprocess.run(cmd.replace(_NEWLINE_, ' '), executable='/bin/bash', shell=True, check=True)  
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'DOCKER_BUILDKIT=0 docker build --network=host --tag amigo_detect/nanoowl:r36.4.0-torch2trt --file /home/castej-jetson/jetson-containers/packages/pytorch/torch2trt/Dockerfile --build-arg BASE_IMAGE=amigo_detect/nanoowl:r36.4.0-torchvision /home/castej-jetson/jetson-containers/packages/pytorch/torch2trt 2>&1 | tee /home/castej-jetson/jetson-containers/logs/20250120_183217/build/amigo_detect_nanoowl_r36.4.0-torch2trt.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1.

I definitely have libnvdla_compiler.so in /usr/lib/aarch64-linux-gnu/tegra/ so im not sure what is happening.

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

3. Tutorial

Startup deep learning tutorial:

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

Hi,

Step 4/6 : COPY patches/ /tmp/patches/
 ---> 6dd6285cdde7

Could you launch the 6dd6285cdde7 image and check if libnvdla_compiler.so is mounted into the container as well?

Thanks.

hello, yes it is found inside the container:

castej-jetson@castejjetson-desktop:~$ sudo docker run --runtime nvidia -it --rm --network=host 6dd6285cdde7
sourcing   /opt/ros/humble/install/setup.bash
ROS_DISTRO humble
ROS_ROOT   /opt/ros/humble
root@castejjetson-desktop:/# find / -name "libnvdla_compiler.so" 2>/dev/null
/usr/lib/aarch64-linux-gnu/nvidia/libnvdla_compiler.so
root@castejjetson-desktop:/# 

Hi,

Thanks for the testing.

Could you also check if you can import tensorrt correctly?
If you can reproduce a similar error when import tensorrt, could you check if adding the libnvdla_compiler.so location to the LD_LIBRARY_PATH helps?

If so, please help to update the Dockerfile accordingly and build it again.

Thanks.

Hello @AastaLLL Thanks for the reply.
I am able to import tensorrt correctly and cannot reproduce a similar error.

root@castejjetson-desktop:/# python3
Python 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorrt
>>> tensorrt.__version__
'10.4.0'
>>> 
KeyboardInterrupt
>>> 
root@castejjetson-desktop:/# find / -name "libnvdla_compiler.so"
/usr/lib/aarch64-linux-gnu/nvidia/libnvdla_compiler.so

So is the rest of the container looking for “libnvdla_compiler.so” in a different path??

Hi,

We got more info about this issue.

As libnvdla_compiler.so is mounted with nvidia runtime, it doesn’t exist in the building stage.
A possible WAR seems to build the torch2trt by launching it with nvidia runtime and saving the package for installation directly.

Below is a related discussion for your reference:

Thanks.

@AastaLLL I’m a bit confused on how to proceed, is the problem inherently fixed or would i have to reflash the Jetson for it to get fixed?
Thanks

That file is installed via the nvidia-l4t-dla-compiler package.

dpkg -S /usr/lib/aarch64-linux-gnu/nvidia/libnvdla_compiler.so
nvidia-l4t-dla-compiler: /usr/lib/aarch64-linux-gnu/nvidia/libnvdla_compiler.so

You could try installing that package to see if it helps.

Hi,

The issue comes from TensorRT requiring the DLA driver (libnvdla_compiler.so).
The DLA driver was part of the TensorRT library but moved to the OOT driver from JetPack 6.1.

On Jetson, the driver is mounted through NVIDIA Container Toolkit (--runtime nvidia).
This indicates that the libraries are mounted when launching the container and don’t exist when building a Dockerfile.

A WAR for this is to either copy the driver manually or build the torch2trt beforehand to ensure libnvdla_compiler.so exists.

Thanks.