Bug Description
I’m completely new to Docker but, after trying unsuccessfully to install Torch-TensorRT with its dependencies, I wanted to try this approach. However, when I try to follow the instructions I encounter a series of problems/bugs as described below:
To Reproduce
Steps to reproduce the behavior:
After installing Docker, run on command prompt the following commands in a local directory:
docker pull nvcr.io/nvidia/pytorch:21.12-py3
git clone https://github.com/NVIDIA/Torch-TensorRT.git
cd Torch-TensorRT
docker build --build-arg BASE=21.12 -f docker/Dockerfile -t torch_tensorrt:latest .
+] Building 1.4s (15/25)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 2.46kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 1.05kB 0.0s
=> [internal] load metadata for nvcr.io/nvidia/pytorch:21.12-py3 0.0s
=> CACHED [base 1/1] FROM nvcr.io/nvidia/pytorch:21.12-py3 0.0s
=> [internal] load build context 0.5s
=> => transferring context: 26.61MB 0.4s
=> CACHED [torch-tensorrt-builder-base 1/5] RUN rm -rf /opt/torch-tensorrt /usr/bin/bazel 0.0s
=> CACHED [torch-tensorrt-builder-base 2/5] RUN [[ "amd64" == "amd64" ]] && ARCH="x86_64" || ARCH="amd64" && wget -q https://github.com/bazelbuild/bazel/releases/download/4.2.1/bazel-4.2.1-linux-x86_64 -O /usr/bin/bazel && chmo 0.0s
=> CACHED [torch-tensorrt-builder-base 3/5] RUN touch /usr/lib/$HOSTTYPE-linux-gnu/libnvinfer_static.a 0.0s
=> CACHED [torch-tensorrt-builder-base 4/5] RUN rm -rf /usr/local/cuda/lib* /usr/local/cuda/include && ln -sf /usr/local/cuda/targets/$HOSTTYPE-linux/lib /usr/local/cuda/lib64 && ln -sf /usr/local/cuda/targets/$HOSTTYPE-linux 0.0s
=> CACHED [torch-tensorrt-builder-base 5/5] RUN apt-get update && apt-get install -y --no-install-recommends locales ninja-build && rm -rf /var/lib/apt/lists/* && locale-gen en_US.UTF-8 0.0s
=> [torch-tensorrt-builder 1/4] COPY . /workspace/torch_tensorrt/src 0.2s
=> [torch-tensorrt 1/11] COPY . /workspace/torch_tensorrt 0.1s
=> [torch-tensorrt-builder 2/4] WORKDIR /workspace/torch_tensorrt/src 0.0s
=> [torch-tensorrt-builder 3/4] RUN cp ./docker/WORKSPACE.docker WORKSPACE 0.3s
=> ERROR [torch-tensorrt-builder 4/4] RUN ./docker/dist-build.sh 0.3s
------
> [torch-tensorrt-builder 4/4] RUN ./docker/dist-build.sh:
#15 0.272 /bin/bash: ./docker/dist-build.sh: /bin/bash^M: bad interpreter: No such file or directory
------
executor failed running [/bin/sh -c ./docker/dist-build.sh]: exit code: 126
To solve this issue I followed the suggestion here and run:
sed -i -e 's/\r$//' scriptname.sh
Then, I retried with
docker build --build-arg BASE=21.12 -f docker/Dockerfile -t torch_tensorrt:latest .
And this time the error was:
[+] Building 118.9s (15/25)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 32B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 35B 0.0s
=> [internal] load metadata for nvcr.io/nvidia/pytorch:21.12-py3 0.0s
=> CACHED [base 1/1] FROM nvcr.io/nvidia/pytorch:21.12-py3 0.0s
=> [internal] load build context 0.2s
=> => transferring context: 48.70kB 0.2s
=> CACHED [torch-tensorrt-builder-base 1/5] RUN rm -rf /opt/torch-tensorrt /usr/bin/bazel 0.0s
=> CACHED [torch-tensorrt-builder-base 2/5] RUN [[ "amd64" == "amd64" ]] && ARCH="x86_64" || ARCH="amd64" && wget -q https://github.com/bazelbuild/bazel/releases/download/4.2.1/bazel-4.2.1-linux-x86_64 -O /usr/bin/bazel && chmo 0.0s
=> CACHED [torch-tensorrt-builder-base 3/5] RUN touch /usr/lib/$HOSTTYPE-linux-gnu/libnvinfer_static.a 0.0s
=> CACHED [torch-tensorrt-builder-base 4/5] RUN rm -rf /usr/local/cuda/lib* /usr/local/cuda/include && ln -sf /usr/local/cuda/targets/$HOSTTYPE-linux/lib /usr/local/cuda/lib64 && ln -sf /usr/local/cuda/targets/$HOSTTYPE-linux 0.0s
=> CACHED [torch-tensorrt-builder-base 5/5] RUN apt-get update && apt-get install -y --no-install-recommends locales ninja-build && rm -rf /var/lib/apt/lists/* && locale-gen en_US.UTF-8 0.0s
=> [torch-tensorrt-builder 1/4] COPY . /workspace/torch_tensorrt/src 0.1s
=> [torch-tensorrt 1/11] COPY . /workspace/torch_tensorrt 0.1s
=> [torch-tensorrt-builder 2/4] WORKDIR /workspace/torch_tensorrt/src 0.0s
=> [torch-tensorrt-builder 3/4] RUN cp ./docker/WORKSPACE.docker WORKSPACE 0.2s
=> ERROR [torch-tensorrt-builder 4/4] RUN ./docker/dist-build.sh 118.0s
------
> [torch-tensorrt-builder 4/4] RUN ./docker/dist-build.sh:
#15 2.846 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
#15 2.846 running bdist_wheel
#15 2.888 Extracting Bazel installation...
#15 5.161 Starting local Bazel server and connecting to it...
#15 6.413 Loading:
#15 6.416 Loading: 0 packages loaded
#15 7.420 Loading: 0 packages loaded
#15 8.415 Analyzing: target //:libtorchtrt (1 packages loaded, 0 targets configured)
#15 9.421 Analyzing: target //:libtorchtrt (35 packages loaded, 75 targets configured)
#15 10.08 INFO: Analyzed target //:libtorchtrt (43 packages loaded, 2967 targets configured).
#15 10.08 INFO: Found 1 target...
#15 10.14 [0 / 117] [Prepa] Writing file cpp/lib/libtorchtrt.so-2.params
#15 11.14 [160 / 465] [Prepa] action 'SolibSymlink _solib_k8/_U@cuda_S_S_Ccublas___Ulib64/libcublas.so' ... (2 actions, 0 running)
#15 12.43 [629 / 731] [Prepa] action 'SolibSymlink _solib_k8/_U@libtorch_S_S_Ctorch___Ulib/libtorch_cpu.so' ... (4 actions, 3 running)
#15 13.44 [631 / 731] Compiling core/util/trt_util.cpp; 1s processwrapper-sandbox ... (5 actions running)
#15 14.52 [631 / 731] Compiling core/util/trt_util.cpp; 2s processwrapper-sandbox ... (5 actions running)
#15 17.73 [631 / 731] Compiling core/util/trt_util.cpp; 5s processwrapper-sandbox ... (6 actions, 5 running)
#15 19.67 [632 / 731] Compiling core/util/trt_util.cpp; 7s processwrapper-sandbox ... (6 actions, 5 running)
#15 22.03 [633 / 731] Compiling core/util/trt_util.cpp; 10s processwrapper-sandbox ... (6 actions, 5 running)
#15 25.15 [634 / 731] Compiling core/util/trt_util.cpp; 13s processwrapper-sandbox ... (6 actions, 5 running)
#15 29.72 [637 / 731] Compiling core/util/trt_util.cpp; 17s processwrapper-sandbox ... (6 actions, 5 running)
#15 50.81 [637 / 731] Compiling core/util/trt_util.cpp; 36s processwrapper-sandbox ... (6 actions, 5 running)
#15 73.30 [637 / 731] Compiling core/util/trt_util.cpp; 59s processwrapper-sandbox ... (6 actions, 5 running)
#15 83.28 [637 / 731] Compiling core/util/trt_util.cpp; 70s processwrapper-sandbox ... (6 actions, 5 running)
#15 104.4 [637 / 731] Compiling core/util/trt_util.cpp; 91s processwrapper-sandbox ... (6 actions, 5 running)
#15 113.1 ERROR: /workspace/torch_tensorrt/src/core/plugins/BUILD:10:11: Compiling core/plugins/impl/interpolate_plugin.cpp failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 62 argument(s) skipped)
#15 113.1
#15 113.1 Use --sandbox_debug to see verbose messages from the sandbox
#15 113.1 gcc: fatal error: Killed signal terminated program cc1plus
#15 113.1 compilation terminated.
#15 114.0 Target //:libtorchtrt failed to build
#15 114.0 Use --verbose_failures to see the command lines of failed build steps.
#15 114.1 INFO: Elapsed time: 111.118s, Critical Path: 101.93s
#15 114.1 INFO: 643 processes: 637 internal, 6 processwrapper-sandbox.
#15 114.1 FAILED: Build did NOT complete successfully
#15 114.2 FAILED: Build did NOT complete successfully
#15 114.3 using CXX11 ABI build
#15 114.3 building libtorchtrt
------
executor failed running [/bin/sh -c ./docker/dist-build.sh]: exit code: 1
What am I doing wrong? It may be completely trivial since I have no experience in Docker.
Expected behavior
No errors.
Environment
- Torch-TensorRT Version (e.g. 1.0.0): 1.0.0 (latest)
- PyTorch Version (e.g. 1.0): 1.10
- CPU Architecture: AMD64
- OS: Windows 10
- How you installed PyTorch: pip & LibTorch
- Python version: 3.9.9
- CUDA version: 10.2
- GPU models and configuration: GeForce RTX 2080