I am running a p3.2xlarge ec2 instance running a Tesla V100-SXM2-16GB, spun up with the prebuilt NVIDIA GPU-Optimized AMI https://aws.amazon.com/marketplace/pp/prodview-7ikjtg3um26wq
I am getting an error when attempting to run the ./docker/build_container_release.sh script [Getting Started with Morpheus - NVIDIA Docs], which seems to do with conda / mamba.
ubuntu@ip-172-31-47-174:~/morpheus$ ./docker/build_container_release.sh
Building morpheus: with args...
CUDA_MAJOR_VER : 12
CUDA_MINOR_VER : 5
CUDA_REV_VER : 1
FROM_IMAGE : nvidia/cuda
LINUX_DISTRO : ubuntu
LINUX_VER : 22.04
MORPHEUS_ROOT_HOST : .
MORPHEUS_SUPPORT_DOCA : OFF
MORPHEUS_BUILD_MORPHEUS_LLM: ON
PYTHON_VER : 3.10
COMMAND: docker build -t nvcr.io/nvidia/morpheus/morpheus:v25.02.00a-runtime --target runtime --build-arg CUDA_MAJOR_VER=12 --build-arg CUDA_MINOR_VER=5 --build-arg CUDA_REV_VER=1 --build-arg FROM_IMAGE=nvidia/cuda --build-arg LINUX_DISTRO=ubuntu --build-arg LINUX_VER=22.04 --build-arg MORPHEUS_ROOT_HOST=. --build-arg MORPHEUS_SUPPORT_DOCA=OFF --build-arg MORPHEUS_BUILD_MORPHEUS_LLM=ON --build-arg MORPHEUS_BUILD_MORPHEUS_DFP=ON --build-arg PYTHON_VER=3.10 --network=host -f /home/ubuntu/morpheus/docker/Dockerfile .
....
> ERROR [conda_bld_morpheus 3/3] RUN --mount=type=cache,id=workspace_cache,target=/workspace/.cache,sharing=locked --mount=type=cache,id=con 528.1s
=> CACHED [git_clone 1/1] RUN --mount=type=bind,source=.,target=/opt/host_repo source activate morpheus && git clone file:///opt/host_repo/. 0.0s
------
> [conda_bld_morpheus 3/3] RUN --mount=type=cache,id=workspace_cache,target=/workspace/.cache,sharing=locked --mount=type=cache,id=conda_pkgs,target=/opt/conda/pkgs,sharing=locked /opt/conda/bin/mamba install -y -n base -c conda-forge "git-lfs" && source activate base && git lfs install && git config --global --add safe.directory "*" && cd . && MORPHEUS_PYTHON_BUILD_STUBS=OFF CONDA_BLD_PATH=/opt/conda/conda-bld ./ci/conda/recipes/run_conda_build.sh morpheus:
1.293 warning libmamba Cache file "/opt/conda/pkgs/cache/497deca9.json" was modified by another program
1.294 warning libmamba Cache file "/opt/conda/pkgs/cache/09cdf8bf.json" was modified by another program
24.35 Transaction
24.35
24.35 Prefix: /opt/conda
24.35
24.35 Updating specs:
24.35
24.35 - git-lfs
24.35 - ca-certificates
24.35 - certifi
24.35 - openssl
24.35
24.35
24.35 Package Version Build Channel Size
24.35 ──────────────────────────────────────────────────────────
24.35 Install:
24.35 ──────────────────────────────────────────────────────────
24.35
24.35 + git-lfs 3.6.0 h647637d_0 conda-forge Cached
24.35
24.35 Summary:
24.35
24.35 Install: 1 packages
24.35
24.35 Total download: 0 B
24.35
24.35 ──────────────────────────────────────────────────────────
24.35
24.35
24.85
24.85 Looking for: ['git-lfs']
24.85
24.85
24.85 Pinned packages:
24.85 - python 3.10.*
24.85
24.85
24.85
24.85 Downloading and Extracting Packages: ...working... done
24.85 Preparing transaction: done
24.95 Verifying transaction: done
25.05 Executing transaction: done
25.71 Updated Git hooks.
25.71 Git LFS initialized.
25.76 Running conda-build for morpheus vv25.02.00a...
25.76 ++ conda mambabuild --use-local --build-id-pat '{n}-{v}' -c conda-forge -c huggingface -c rapidsai -c rapidsai-nightly -c nvidia -c nvidia/label/dev -c pytorch -c defaults ci/conda/recipes/morpheus
26.77 INFO:conda_index.index.convert_cache:Migrate database
...
515.9 ninja: build stopped: subcommand failed.
517.5 Traceback (most recent call last):
517.5 File "/opt/conda/bin/conda-mambabuild", line 10, in <module>
517.5 sys.exit(main())
517.5 File "/opt/conda/lib/python3.10/site-packages/boa/cli/mambabuild.py", line 301, in main
517.5 call_conda_build(action, config)
517.5 File "/opt/conda/lib/python3.10/site-packages/boa/cli/mambabuild.py", line 273, in call_conda_build
517.5 result = api.build(
517.5 File "/opt/conda/lib/python3.10/site-packages/conda_build/api.py", line 250, in build
517.5 return build_tree(
517.5 File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 3638, in build_tree
517.5 packages_from_this = build(
517.5 File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 2711, in build
517.5 newly_built_packages = bundlers[pkg_type](output_d, m, env, stats)
517.5 File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 1784, in bundle_conda
517.5 utils.check_call_env(
517.5 File "/opt/conda/lib/python3.10/site-packages/conda_build/utils.py", line 405, in check_call_env
517.5 return _func_defaulting_env_to_os_environ("call", *popenargs, **kwargs)
517.5 File "/opt/conda/lib/python3.10/site-packages/conda_build/utils.py", line 381, in _func_defaulting_env_to_os_environ
517.5 raise subprocess.CalledProcessError(proc.returncode, _args)
517.5 subprocess.CalledProcessError: Command '['/usr/bin/bash', '-e', '/opt/conda/conda-bld/morpheus-split-25.02.00a/work/morpheus_build.sh']' returned non-zero exit status 1.
------
Dockerfile:264
--------------------
263 |
264 | >>> RUN --mount=type=cache,id=workspace_cache,target=/workspace/.cache,sharing=locked \
265 | >>> --mount=type=cache,id=conda_pkgs,target=/opt/conda/pkgs,sharing=locked \
266 | >>> # Install git-lfs before running the build to avoid errors during conda build
267 | >>> /opt/conda/bin/mamba install -y -n base -c conda-forge "git-lfs" &&\
268 | >>> source activate base &&\
269 | >>> git lfs install &&\
270 | >>> # Need to get around recent versions of git locking paths until they are deemed safe
271 | >>> git config --global --add safe.directory "*" &&\
272 | >>> # Change to the morpheus directory and build the conda package
273 | >>> cd ${MORPHEUS_ROOT_HOST} &&\
274 | >>> MORPHEUS_PYTHON_BUILD_STUBS=OFF CONDA_BLD_PATH=/opt/conda/conda-bld ./ci/conda/recipes/run_conda_build.sh morpheus
275 |
--------------------
ERROR: failed to solve: process "/bin/bash -c /opt/conda/bin/mamba install -y -n base -c conda-forge \"git-lfs\" && source activate base && git lfs install && git config --global --add safe.directory \"*\" && cd ${MORPHEUS_ROOT_HOST} && MORPHEUS_PYTHON_BUILD_STUBS=OFF CONDA_BLD_PATH=/opt/conda/conda-bld ./ci/conda/recipes/run_conda_build.sh morpheus" did not complete successfully: exit code: 1
GPU and driver details:
ubuntu@ip-172-31-47-174:~$ nvidia-smi
Wed Nov 27 14:42:57 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla V100-SXM2-16GB On | 00000000:00:1E.0 Off | 0 |
| N/A 30C P0 23W / 300W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+