Jetson could not load library dlopen error in ROS container

hi~

I tried compiling my ROS code in the Docker container. my environment:
Device: Jetson AGX Orin
Jetpack: 5.0.1
Docker image: dustynv/ros galactic-ros-base-l4t-r34.1.1

When I compiled on the host, everything worked fine. When I try to open it after compiling code in the container, I find the following error:

Could not load library dlopen error: /usr/lib/aarch64-linux-gnu/tegra/libnvdla_compiler.so: file too short, at /opt/ros/galactic/src/rcutils/src/shared_library.c:99

I tried to find the error file, but failed:

/home# cat /opt/ros/galactic/src/rcutils/src/shared_library.c
cat: /opt/ros/galactic/src/rcutils/src/shared_library.c: No such file or directory

I spent some time cutting out almost all the code. In the example code in visual_test.zip (4.9 KB), through the ROS launch system, the nodes defined in cv_base_node.cpp will be opened, and the main functions of the application is implemented in TRTModule.hpp (I even removed the code calling TRTModule.hpp in cv_base_node.hpp). When I run it, I can reproduce the problem. When I removed the createInferBuilder code in TRTModule.hpp:34:

// ...
void TRTModule::build_engine_from_onnx(const std::string &onnx_file)
{
    std::cout << "[INFO]: build engine from onnx" << std::endl;
    auto builder = createInferBuilder(gLogger);
    // ...
}
// ...

after, it worked fine.

The problem appears to be due to TensorRT, but I can’t quite determine the cause because code
everything worked fine when built on the host.

Here is the process I tried to build:

nvidia@nvidia-desktop:~/Desktop$ docker images
REPOSITORY    TAG                             IMAGE ID       CREATED         SIZE
dustynv/ros   galactic-ros-base-l4t-r34.1.1   9988e698bd55   2 months ago    12.1GB
nvidia@nvidia-desktop:~/Desktop$ docker run -it -v /home/nvidia/Desktop/visual_test/:/home dustynv/ros:galactic-ros-base-l4t-r34.1.1 /bin/bash
sourcing   /opt/ros/galactic/install/setup.bash
ROS_ROOT   /opt/ros/galactic
ROS_DISTRO galactic
root@fb271e1a2336:/# cd home/
root@fb271e1a2336:/home# ls
CMakeLists.txt  include  launch  package.xml  src
root@fb271e1a2336:/home# colcon build
Starting >>> visual_test
--- stderr: visual_test                              
/home/src/TRTModule.cpp: In member function ‘void TRTModule::create_module(const string&)’:
/home/src/TRTModule.cpp:26:50: warning: unused parameter ‘onnx_file’ [-Wunused-parameter]
   26 | void TRTModule::create_module(const std::string &onnx_file)
      |                               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~
/home/src/TRTModule.cpp: In member function ‘void TRTModule::build_engine_from_onnx(const string&)’:
/home/src/TRTModule.cpp:34:10: warning: unused variable ‘builder’ [-Wunused-variable]
   34 |     auto builder = createInferBuilder(gLogger);
      |          ^~~~~~~
/home/src/TRTModule.cpp:31:59: warning: unused parameter ‘onnx_file’ [-Wunused-parameter]
   31 | void TRTModule::build_engine_from_onnx(const std::string &onnx_file)
      |                                        ~~~~~~~~~~~~~~~~~~~^~~~~~~~~
/home/src/TRTModule.cpp: In member function ‘void TRTModule::build_engine_from_cache(const string&)’:
/home/src/TRTModule.cpp:79:60: warning: unused parameter ‘cache_file’ [-Wunused-parameter]
   79 | void TRTModule::build_engine_from_cache(const std::string &cache_file)
      |                                         ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
/home/src/TRTModule.cpp: In member function ‘void TRTModule::cache_engine(const string&)’:
/home/src/TRTModule.cpp:84:49: warning: unused parameter ‘cache_file’ [-Wunused-parameter]
   84 | void TRTModule::cache_engine(const std::string &cache_file)
      |                              ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
In file included from /home/include/cv_base_node.hpp:17,
                 from /home/src/cv_base_node.cpp:1:
/opt/ros/galactic/install/include/image_transport/image_transport.h:41:89: note: #pragma message: Warning: This header is deprecated. Use 'image_transport.hpp' instead
   41 | #pragma message ("Warning: This header is deprecated. Use 'image_transport.hpp' instead")
      |                                                                                         ^
In file included from /home/include/cv_base_node.hpp:18,
                 from /home/src/cv_base_node.cpp:1:
/opt/ros/galactic/install/include/camera_info_manager/camera_info_manager.h:41:93: note: #pragma message: Warning: This header is deprecated. Use 'camera_info_manager.hpp' instead
   41 | #pragma message ("Warning: This header is deprecated. Use 'camera_info_manager.hpp' instead")
      |                                                                                             ^
---
Finished <<< visual_test [9.78s]

Summary: 1 package finished [9.91s]
  1 package had stderr output: visual_test
root@fb271e1a2336:/home# . install/setup.sh 
root@fb271e1a2336:/home# ros2 launch visual_test visual_launch.py
[INFO] [launch]: All log files can be found below /root/.ros/log/2022-08-13-13-17-28-509826-fb271e1a2336-303
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [Visual-1]: process started with pid [305]
[Visual-1] terminate called after throwing an instance of 'class_loader::LibraryLoadException'
[Visual-1]   what():  Could not load library dlopen error: /usr/lib/aarch64-linux-gnu/tegra/libnvdla_compiler.so: file too short, at /opt/ros/galactic/src/rcutils/src/shared_library.c:99
[ERROR] [Visual-1]: process has died [pid 305, exit code -6, cmd '/home/install/visual_test/lib/visual_test/Visual --ros-args -r __node:=Visual'].

Can you help me?

Hi,

It seems there are some issues in the ‘libnvdla_compiler.so’ library.
The library is used when TensorRT generates a DLA-based engine.
So the application can work if it doesn’t use the TensorRT builder.

Could you locate the library within docker to see if the file works well?

$ ll /usr/lib/aarch64-linux-gnu/tegra/libnvdla_compiler.so

Thanks.

Hi @AastaLLL ,I looked up the library, and everything seemed fine:

root@fb271e1a2336:/home# ros2 launch visual_test visual_launch.py
[INFO] [launch]: All log files can be found below /root/.ros/log/2022-08-15-03-27-19-581736-fb271e1a2336-345
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [Visual-1]: process started with pid [347]
[Visual-1] terminate called after throwing an instance of 'class_loader::LibraryLoadException'
[Visual-1]   what():  Could not load library dlopen error: /usr/lib/aarch64-linux-gnu/tegra/libnvdla_compiler.so: file too short, at /opt/ros/galactic/src/rcutils/src/shared_library.c:99
[ERROR] [Visual-1]: process has died [pid 347, exit code -6, cmd '/home/install/visual_test/lib/visual_test/Visual --ros-args -r __node:=Visual'].
root@fb271e1a2336:/home# ll /usr/lib/aarch64-linux-gnu/tegra/libnvdla_compiler.so
-rw-r--r-- 1 root root 0 May 23 13:49 /usr/lib/aarch64-linux-gnu/tegra/libnvdla_compiler.so

Hi,

Since we just release a GA software for Orin, would you mind testing it with the JetPack 5.0.2 GA again?
Thanks.

hi,@ AastaLLL~

I noticed earlier today that Nvidia released the Jetpack 5.0.2 GA. I will install the Jetpack later and try to see if I can reproduce this problem. After finishing the test, I will reply to the question again if it is not resolved.

In addition, I think that configuring the environment in Docker containers does not depend on the host system version. As @dusty_nv mentioned:

I wanted to be able to build a Docker image for production and not rely on specifically Jetpack version. The reason why I try to use docker is that my production environment is Jetson NX module jetpack 4.6, which uses the third-party carrier. I reproduced the problem on Jetson NX module. The manufacturer of carrier board does not provide jetpack 5.x and my work depended on foxy or higher (it depended on Ubuntu 20.04) due to ros known DDS bugs. Even though I can fix this in the newly released image, I still can’t apply the code to my production environment.

Maybe you have a better idea?

Hi,

We have tested your source on Orin with JetPack 5.0.2 GA.
The app doesn’t show any error but does not terminate.

Is this the expected behavior?

$ sudo docker run -it --rm --net=host --runtime nvidia -v /home/nvidia/topic_223838:/home -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix dustynv/ros:galactic-ros-base-l4t-r34.1.1
# cd /home/
# colcon build
#  . install/setup.sh
# ros2 launch visual_test visual_launch.py
[INFO] [launch]: All log files can be found below /root/.ros/log/2022-08-17-07-53-23-043614-ubuntu-313
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [Visual-1]: process started with pid [315] 

Thanks.

Hi @AastaLLL , With your help, I re-tested code in the JetPack 4.6/5.0.1/5.0.2 GA environment and they all passed. I think I started the container by mistake and did not specify --runtime nvidia options. Thank you very much for the sample you provided to help me find the problem.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.