TrT_Pose Docker

Hi!

I’m trying to setup a build system for a project that uses the Xavier and TrT_Pose (among other things). So I want to create a container with docker for my application and I want to build the image on a x86 host (so I can put it on a server somewhere and push to a fleet of xaviers).

I’m running into problems when I try to install the trt_pose package. I get this error when running python3 setup.py install:

Traceback (most recent call last):
  File "setup.py", line 2, in <module>
    from torch.utils import cpp_extension
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 188, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 141, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory

How can I resolve this? I have the installed CUDA cross-compilation tools using the SDK manager. I also have setup the nvidia docker runtime as default.

Hi,

Please noted that you would need to install the same CUDA version as Jetson since the image will mount the library from the host.
If all well-installed, could you check if you can run the devicequery example with the following steps?

Thanks.

I think everything is correctly installed. I can run build and run the device query example without any problem (after changing the base docker image to r32.4.4). Here is the output:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Xavier"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    7.2
  Total amount of global memory:                 7772 MBytes (8149057536 bytes)
  ( 6) Multiprocessors, ( 64) CUDA Cores/MP:     384 CUDA Cores
  GPU Max Clock rate:                            1109 MHz (1.11 GHz)
  Memory Clock rate:                             1109 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

But I still can’t build my container. Here is a minimal example of the dockerfile:

# This includes L4T (with CUDA etc) and PyTorch 1.6
FROM nvcr.io/nvidia/l4t-pytorch:r32.4.4-pth1.6-py3

# Use bash from here
SHELL ["/bin/bash", "-c"]

# Torch2TRT
RUN cd ~ \
 && git clone https://github.com/NVIDIA-AI-IOT/torch2trt \
 && cd torch2trt && python3 setup.py install --plugins

# Trt_pose
RUN pip3 install tqdm cython pycocotools \
 && apt-get install -y python3-matplotlib \
 && cd ~ \
 && git clone https://github.com/NVIDIA-AI-IOT/trt_pose \
 && cd trt_pose && python3 setup.py install

# Start command line on start
CMD ["/bin/bash"]

Which gives the error posted above.

So I’m not sure what is wrong. I have inspected one of the intermediate containers created and it’s true that the file " libcurand.so.10" does not exist. In my host /usr/local/cuda-10.2/targets there are two folders aarch64-linux which is the folder that is also visible inside the container, and x86_64-linux. The x86_64-linux folder has libcurand.so.10 in the lib folder. The aarch64-linux folder does not have that file in lib. It only has:

libcudadevrt.a     libcudart.so.10.2.89  libcufft_static_nocallback.a  libcupti.so          libcurand_static.a    liblapack_static.a  libnppial_static.a   libnppidei_static.a  libnppim_static.a   libnppitc_static.a   libnvperf_host.so     libnvrtc-builtins.so.10.2
libcudart.so       libcudart_static.a    libcufftw_static.a            libcupti.so.10.2     libcusolver_static.a  libmetis_static.a   libnppicc_static.a   libnppif_static.a    libnppist_static.a  libnpps_static.a     libnvperf_target.so   libnvrtc-builtins.so.10.2.89
libcudart.so.10.2  libcufft_static.a     libculibos.a                  libcupti.so.10.2.75  libcusparse_static.a  libnppc_static.a    libnppicom_static.a  libnppig_static.a    libnppisu_static.a  libnvgraph_static.a  libnvrtc-builtins.so  stubs

However on my Xavier NX the aarch64-linux/lib folder does contain the libcurand.so.10 file:

libcudadevrt.a         libcufft_static.a             libcuinj64.so.10.2.89   libcurand_static.a        libcusparse_static.a  libnppial.so.10         libnppicom.so.10         libnppif.so.10         libnppim.so.10          libnppisu.so.10         libnpps.so.10          libnvperf_target.so           libnvToolsExt.so.1
libcudart.so           libcufft_static_nocallback.a  libculibos.a            libcusolver.so            liblapack_static.a    libnppial.so.10.2.1.89  libnppicom.so.10.2.1.89  libnppif.so.10.2.1.89  libnppim.so.10.2.1.89   libnppisu.so.10.2.1.89  libnpps.so.10.2.1.89   libnvrtc-builtins.so          libnvToolsExt.so.1.0.0
libcudart.so.10.2      libcufftw.so                  libcupti.so             libcusolver.so.10         libmetis_static.a     libnppial_static.a      libnppicom_static.a      libnppif_static.a      libnppim_static.a       libnppisu_static.a      libnpps_static.a       libnvrtc-builtins.so.10.2     stubs
libcudart.so.10.2.89   libcufftw.so.10               libcupti.so.10.2        libcusolver.so.10.3.0.89  libnppc.so            libnppicc.so            libnppidei.so            libnppig.so            libnppist.so            libnppitc.so            libnvgraph.so          libnvrtc-builtins.so.10.2.89
libcudart_static.a     libcufftw.so.10.1.2.89        libcupti.so.10.2.75     libcusolver_static.a      libnppc.so.10         libnppicc.so.10         libnppidei.so.10         libnppig.so.10         libnppist.so.10         libnppitc.so.10         libnvgraph.so.10       libnvrtc.so
libcufft.so            libcufftw_static.a            libcurand.so            libcusparse.so            libnppc.so.10.2.1.89  libnppicc.so.10.2.1.89  libnppidei.so.10.2.1.89  libnppig.so.10.2.1.89  libnppist.so.10.2.1.89  libnppitc.so.10.2.1.89  libnvgraph.so.10.2.89  libnvrtc.so.10.2
libcufft.so.10         libcuinj64.so                 libcurand.so.10         libcusparse.so.10         libnppc_static.a      libnppicc_static.a      libnppidei_static.a      libnppig_static.a      libnppist_static.a      libnppitc_static.a      libnvgraph_static.a    libnvrtc.so.10.2.89
libcufft.so.10.1.2.89  libcuinj64.so.10.2            libcurand.so.10.1.2.89  libcusparse.so.10.3.1.89  libnppial.so          libnppicom.so           libnppif.so              libnppim.so            libnppisu.so            libnpps.so              libnvperf_host.so      libnvToolsExt.so

So why is that? Is there something wrong with my installation on the host x86 system?

Any updates on this?

Hi,

Sorry for the late update.

May I know which CUDA package do you install on the host?
Have you installed the cross package cuda-repo-cross-aarch64-10-2-local-10.2.89_1.0-1_all.deb from the SDKmanager?

There is a related library included and might solve your issue.

$ dpkg -c cuda-repo-cross-aarch64-10-2-local-10.2.89_1.0-1_all.deb
...
-rw-r--r-- root/root      2550 2019-10-30 08:29 ./var/cuda-repo-10-2-local-10.2.89-cross-aarch64/cuda-cross-aarch64_10.2.89-1_all.deb
...

Could you give it a try first?

Thanks.

Hi,

I already had it installed. And it still does not work. But I’ve been told elsewhere that it is never going to work since running setup.py for trt_pose requires importing torch which is a runtime behavior and runtime cuda files are not available. Or something like that.

So I setup a Xavier as a build machine instead.

Hi,

Thanks for your feedback.

L4T docker on an x86 machine is still limited.
Not sure if your issue is caused by the below limitation:
https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-Container-Runtime-on-Jetson#building-jetson-containers-on-an-x86-workstation-using-qemu

Known limitation: Unfortunately you won’t be able to run any binary that calls into the NVIDIA driver on the x86 host.

But the L4T container should work correctly on the Jetson itself.

Thanks.