Can't build TorchTensorRT Docker image on Windows

Bug Description

I’m completely new to Docker but, after trying unsuccessfully to install Torch-TensorRT with its dependencies, I wanted to try this approach. However, when I try to follow the instructions I encounter a series of problems/bugs as described below:

To Reproduce

Steps to reproduce the behavior:

After installing Docker, run on command prompt the following commands in a local directory:

  1. docker pull nvcr.io/nvidia/pytorch:21.12-py3
  2. git clone https://github.com/NVIDIA/Torch-TensorRT.git
  3. cd Torch-TensorRT
  4. docker build --build-arg BASE=21.12 -f docker/Dockerfile -t torch_tensorrt:latest .
+] Building 1.4s (15/25)
 => [internal] load build definition from Dockerfile                                                                                                                                                                                    0.0s
 => => transferring dockerfile: 2.46kB                                                                                                                                                                                                  0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                       0.0s
 => => transferring context: 1.05kB                                                                                                                                                                                                     0.0s
 => [internal] load metadata for nvcr.io/nvidia/pytorch:21.12-py3                                                                                                                                                                       0.0s
 => CACHED [base 1/1] FROM nvcr.io/nvidia/pytorch:21.12-py3                                                                                                                                                                             0.0s
 => [internal] load build context                                                                                                                                                                                                       0.5s
 => => transferring context: 26.61MB                                                                                                                                                                                                    0.4s
 => CACHED [torch-tensorrt-builder-base 1/5] RUN rm -rf /opt/torch-tensorrt /usr/bin/bazel                                                                                                                                              0.0s
 => CACHED [torch-tensorrt-builder-base 2/5] RUN [[ "amd64" == "amd64" ]] && ARCH="x86_64" || ARCH="amd64"  && wget -q https://github.com/bazelbuild/bazel/releases/download/4.2.1/bazel-4.2.1-linux-x86_64 -O /usr/bin/bazel  && chmo  0.0s
 => CACHED [torch-tensorrt-builder-base 3/5] RUN touch /usr/lib/$HOSTTYPE-linux-gnu/libnvinfer_static.a                                                                                                                                 0.0s
 => CACHED [torch-tensorrt-builder-base 4/5] RUN rm -rf /usr/local/cuda/lib* /usr/local/cuda/include   && ln -sf /usr/local/cuda/targets/$HOSTTYPE-linux/lib /usr/local/cuda/lib64   && ln -sf /usr/local/cuda/targets/$HOSTTYPE-linux  0.0s
 => CACHED [torch-tensorrt-builder-base 5/5] RUN apt-get update && apt-get install -y --no-install-recommends locales ninja-build && rm -rf /var/lib/apt/lists/* && locale-gen en_US.UTF-8                                              0.0s
 => [torch-tensorrt-builder 1/4] COPY . /workspace/torch_tensorrt/src                                                                                                                                                                   0.2s
 => [torch-tensorrt  1/11] COPY . /workspace/torch_tensorrt                                                                                                                                                                             0.1s
 => [torch-tensorrt-builder 2/4] WORKDIR /workspace/torch_tensorrt/src                                                                                                                                                                  0.0s
 => [torch-tensorrt-builder 3/4] RUN cp ./docker/WORKSPACE.docker WORKSPACE                                                                                                                                                             0.3s
 => ERROR [torch-tensorrt-builder 4/4] RUN ./docker/dist-build.sh                                                                                                                                                                       0.3s
------
 > [torch-tensorrt-builder 4/4] RUN ./docker/dist-build.sh:
#15 0.272 /bin/bash: ./docker/dist-build.sh: /bin/bash^M: bad interpreter: No such file or directory
------
executor failed running [/bin/sh -c ./docker/dist-build.sh]: exit code: 126

To solve this issue I followed the suggestion here and run:

  1. sed -i -e 's/\r$//' scriptname.sh

Then, I retried with

  1. docker build --build-arg BASE=21.12 -f docker/Dockerfile -t torch_tensorrt:latest .

And this time the error was:

[+] Building 118.9s (15/25)
 => [internal] load build definition from Dockerfile                                                                                                                                                                                    0.0s
 => => transferring dockerfile: 32B                                                                                                                                                                                                     0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                       0.0s
 => => transferring context: 35B                                                                                                                                                                                                        0.0s
 => [internal] load metadata for nvcr.io/nvidia/pytorch:21.12-py3                                                                                                                                                                       0.0s
 => CACHED [base 1/1] FROM nvcr.io/nvidia/pytorch:21.12-py3                                                                                                                                                                             0.0s
 => [internal] load build context                                                                                                                                                                                                       0.2s
 => => transferring context: 48.70kB                                                                                                                                                                                                    0.2s
 => CACHED [torch-tensorrt-builder-base 1/5] RUN rm -rf /opt/torch-tensorrt /usr/bin/bazel                                                                                                                                              0.0s
 => CACHED [torch-tensorrt-builder-base 2/5] RUN [[ "amd64" == "amd64" ]] && ARCH="x86_64" || ARCH="amd64"  && wget -q https://github.com/bazelbuild/bazel/releases/download/4.2.1/bazel-4.2.1-linux-x86_64 -O /usr/bin/bazel  && chmo  0.0s
 => CACHED [torch-tensorrt-builder-base 3/5] RUN touch /usr/lib/$HOSTTYPE-linux-gnu/libnvinfer_static.a                                                                                                                                 0.0s
 => CACHED [torch-tensorrt-builder-base 4/5] RUN rm -rf /usr/local/cuda/lib* /usr/local/cuda/include   && ln -sf /usr/local/cuda/targets/$HOSTTYPE-linux/lib /usr/local/cuda/lib64   && ln -sf /usr/local/cuda/targets/$HOSTTYPE-linux  0.0s
 => CACHED [torch-tensorrt-builder-base 5/5] RUN apt-get update && apt-get install -y --no-install-recommends locales ninja-build && rm -rf /var/lib/apt/lists/* && locale-gen en_US.UTF-8                                              0.0s
 => [torch-tensorrt-builder 1/4] COPY . /workspace/torch_tensorrt/src                                                                                                                                                                   0.1s
 => [torch-tensorrt  1/11] COPY . /workspace/torch_tensorrt                                                                                                                                                                             0.1s
 => [torch-tensorrt-builder 2/4] WORKDIR /workspace/torch_tensorrt/src                                                                                                                                                                  0.0s
 => [torch-tensorrt-builder 3/4] RUN cp ./docker/WORKSPACE.docker WORKSPACE                                                                                                                                                             0.2s
 => ERROR [torch-tensorrt-builder 4/4] RUN ./docker/dist-build.sh                                                                                                                                                                     118.0s
------
 > [torch-tensorrt-builder 4/4] RUN ./docker/dist-build.sh:
#15 2.846 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
#15 2.846 running bdist_wheel
#15 2.888 Extracting Bazel installation...
#15 5.161 Starting local Bazel server and connecting to it...
#15 6.413 Loading:
#15 6.416 Loading: 0 packages loaded
#15 7.420 Loading: 0 packages loaded
#15 8.415 Analyzing: target //:libtorchtrt (1 packages loaded, 0 targets configured)
#15 9.421 Analyzing: target //:libtorchtrt (35 packages loaded, 75 targets configured)
#15 10.08 INFO: Analyzed target //:libtorchtrt (43 packages loaded, 2967 targets configured).
#15 10.08 INFO: Found 1 target...
#15 10.14 [0 / 117] [Prepa] Writing file cpp/lib/libtorchtrt.so-2.params
#15 11.14 [160 / 465] [Prepa] action 'SolibSymlink _solib_k8/_U@cuda_S_S_Ccublas___Ulib64/libcublas.so' ... (2 actions, 0 running)
#15 12.43 [629 / 731] [Prepa] action 'SolibSymlink _solib_k8/_U@libtorch_S_S_Ctorch___Ulib/libtorch_cpu.so' ... (4 actions, 3 running)
#15 13.44 [631 / 731] Compiling core/util/trt_util.cpp; 1s processwrapper-sandbox ... (5 actions running)
#15 14.52 [631 / 731] Compiling core/util/trt_util.cpp; 2s processwrapper-sandbox ... (5 actions running)
#15 17.73 [631 / 731] Compiling core/util/trt_util.cpp; 5s processwrapper-sandbox ... (6 actions, 5 running)
#15 19.67 [632 / 731] Compiling core/util/trt_util.cpp; 7s processwrapper-sandbox ... (6 actions, 5 running)
#15 22.03 [633 / 731] Compiling core/util/trt_util.cpp; 10s processwrapper-sandbox ... (6 actions, 5 running)
#15 25.15 [634 / 731] Compiling core/util/trt_util.cpp; 13s processwrapper-sandbox ... (6 actions, 5 running)
#15 29.72 [637 / 731] Compiling core/util/trt_util.cpp; 17s processwrapper-sandbox ... (6 actions, 5 running)
#15 50.81 [637 / 731] Compiling core/util/trt_util.cpp; 36s processwrapper-sandbox ... (6 actions, 5 running)
#15 73.30 [637 / 731] Compiling core/util/trt_util.cpp; 59s processwrapper-sandbox ... (6 actions, 5 running)
#15 83.28 [637 / 731] Compiling core/util/trt_util.cpp; 70s processwrapper-sandbox ... (6 actions, 5 running)
#15 104.4 [637 / 731] Compiling core/util/trt_util.cpp; 91s processwrapper-sandbox ... (6 actions, 5 running)
#15 113.1 ERROR: /workspace/torch_tensorrt/src/core/plugins/BUILD:10:11: Compiling core/plugins/impl/interpolate_plugin.cpp failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 62 argument(s) skipped)
#15 113.1
#15 113.1 Use --sandbox_debug to see verbose messages from the sandbox
#15 113.1 gcc: fatal error: Killed signal terminated program cc1plus
#15 113.1 compilation terminated.
#15 114.0 Target //:libtorchtrt failed to build
#15 114.0 Use --verbose_failures to see the command lines of failed build steps.
#15 114.1 INFO: Elapsed time: 111.118s, Critical Path: 101.93s
#15 114.1 INFO: 643 processes: 637 internal, 6 processwrapper-sandbox.
#15 114.1 FAILED: Build did NOT complete successfully
#15 114.2 FAILED: Build did NOT complete successfully
#15 114.3 using CXX11 ABI build
#15 114.3 building libtorchtrt
------
executor failed running [/bin/sh -c ./docker/dist-build.sh]: exit code: 1

What am I doing wrong? It may be completely trivial since I have no experience in Docker.

Expected behavior

No errors.

Environment

  • Torch-TensorRT Version (e.g. 1.0.0): 1.0.0 (latest)
  • PyTorch Version (e.g. 1.0): 1.10
  • CPU Architecture: AMD64
  • OS: Windows 10
  • How you installed PyTorch: pip & LibTorch
  • Python version: 3.9.9
  • CUDA version: 10.2
  • GPU models and configuration: GeForce RTX 2080

Hi,

We recommend you to post your concern on Issues · pytorch/TensorRT · GitHub, to get better help.

Thank you.