Detectron2 libcurand.so.10

Hi,

I’m trying to get Detectron2 working in a container on my Jetson Nano.

On the host, /etc/nv_tegra_release contains:

# R32 (release), REVISION: 5.0, GCID: 25531747, BOARD: t210ref, EABI: aarch64, DATE: Fri Jan 15 22:55:35 UTC 2021

My Docker container is using a matching tag as recommended.

My Dockerfile currently looks like this:

FROM nvcr.io/nvidia/l4t-ml:r32.5.0-py3

ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y \
    python3-opencv ca-certificates python3-dev build-essential git wget sudo ninja-build libopenblas-base libopenmpi-dev

# create a non-root user
ARG USER_ID=1000
RUN useradd -m --no-log-init --system  --uid ${USER_ID} appuser -g sudo
RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
USER appuser
WORKDIR /home/appuser

ENV PATH="/home/appuser/.local/bin:${PATH}"
RUN wget https://bootstrap.pypa.io/get-pip.py && \
    python3 get-pip.py --user && \
    rm get-pip.py

RUN pip install --user cmake   # cmake from apt-get is too old

# Detectron2
RUN sudo apt install -y libjpeg-dev zlib1g-dev
RUN git clone https://github.com/facebookresearch/detectron2.git
RUN pip install -e detectron2

However it fails at the detectron2 installation with:

Obtaining file:///home/appuser/detectron2
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/appuser/detectron2/setup.py'"'"'; __file__='"'"'/home/appuser/detectron2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-jv51b9d7
         cwd: /home/appuser/detectron2/
    Complete output (11 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/appuser/detectron2/setup.py", line 10, in <module>
        import torch
      File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 189, in <module>
        _load_global_deps()
      File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 142, in _load_global_deps
        ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
      File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
        self._handle = _dlopen(self._name, mode)
    OSError: libcurand.so.10: cannot open shared object file: No such file or directory

Some searching suggests libcurand requires CUDA version 10, which I assume I don’t have? However I have seen successful builds so not sure where I’m going wrong.

Any help is much appreciated!

I believe I managed to fix it - the issue was the user creation step.

New Dockerfile is below

# Base on NVIDIA machine learning container
# It already contains PyTorch, TensorFlow and required dependencies.
FROM nvcr.io/nvidia/l4t-ml:r32.5.0-py3

ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y \
    python3-opencv ca-certificates python3-dev build-essential git wget sudo ninja-build libopenblas-base libopenmpi-dev

ENV PATH="/root/.local/bin:${PATH}"
RUN wget https://bootstrap.pypa.io/get-pip.py && \
    python3 get-pip.py --user && \
    rm get-pip.py

# Installation fails without manually installing the correct version of PyYAML
RUN pip install --user PyYAML==5.4.1

# Detectron2
RUN sudo apt install -y libjpeg-dev zlib1g-dev
RUN git clone https://github.com/facebookresearch/detectron2.git

# set FORCE_CUDA because during `docker build` cuda is not accessible
ENV FORCE_CUDA="1"
ARG TORCH_CUDA_ARCH_LIST="Maxwell"
ENV TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST}"

RUN pip install -e detectron2

# Set a fixed model cache directory.
ENV FVCORE_CACHE="/tmp"
WORKDIR /detectron2

@XDGFX
for detectron you may refer to detectron2/INSTALL.md at master · facebookresearch/detectron2 · GitHub
did not try it in a dockerized form though , but it builds from sources on jetson after installing pre-requisites