Failed building: nccl

Problem

Building of nccl docker container on Jetson Orin Nano fails.

Step 7/7 : RUN /tmp/nccl/install.sh
 ---> Running in 7339593e525c
+ echo 'Installing NVIDIA NCCL 2.27.7 (NCCL)'
+ [[ tegra-aarch64 == \a\a\r\c\h\6\4 ]]
+ DEB=nccl-local-repo-ubuntu2204-2.27.7-cuda13.0_1.0-1_amd64.debdeb
+ cd /tmp/nccl
Installing NVIDIA NCCL 2.27.7 (NCCL)
+ wget --quiet --show-progress --progress=bar:force:noscroll --no-check-certificate https://apt.jetson-ai-lab.io/multiarch/nccl-local-repo-ubuntu2204-2.27.7-cuda13.0_1.0-1_amd64.debdeb
The command '/bin/sh -c /tmp/nccl/install.sh' returned a non-zero code: 8
[20:23:42] Failed building:  nccl

Traceback (most recent call last):
  File "/ssd/data/jetson-containers/jetson_containers/build.py", line 129, in <module>
    build_container(**vars(args))
  File "/ssd/data/jetson-containers/jetson_containers/container.py", line 225, in build_container
    status = subprocess.run(cmd.replace(_NEWLINE_, ' '), executable='/bin/bash', shell=True, check=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'DOCKER_BUILDKIT=0 docker build --network=host   --tag nccl:r36.4.tegra-aarch64-cu126-22.04-nccl   --file /ssd/data/jetson-containers/packages/cuda/nccl/Dockerfile   --build-arg BASE_IMAGE=nccl:r36.4.tegra-aarch64-cu126-22.04-cuda   --build-arg NCCL_VERSION="2.27.7"   --build-arg IS_SBSA="False"   --build-arg CUDA_ARCH="tegra-aarch64"   --build-arg DISTRO="ubuntu2204"    /ssd/data/jetson-containers/packages/cuda/nccl 2>&1 | tee /ssd/data/jetson-containers/logs/20250819_201655/build/04o4_nccl_r36.4.tegra-aarch64-cu126-22.04-nccl.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 8.

Target system

Software part of jetson-stats 4.3.2 - (c) 2024, Raffaello Bonghi
Jetpack missing!
 - Model: NVIDIA Jetson Orin Nano Engineering Reference Developer Kit Super
 - L4T: 36.4.4
NV Power Mode[2]: MAXN_SUPER
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:
 - P-Number: p3767-0005
 - Module: NVIDIA Jetson Orin Nano (Developer kit)
Platform:
 - Distribution: Ubuntu 22.04 Jammy Jellyfish
 - Release: 5.15.148-tegra
jtop:
 - Version: 4.3.2
 - Service: Active
Libraries:
 - CUDA: 12.6.68
 - cuDNN: 1.0
 - TensorRT: 10.3.0.30
 - VPI: 3.2.4
 - Vulkan: 1.3.204
 - OpenCV: 4.8.0 - with CUDA: NO

Command

$ jetson-containers build nccl

Remarks

When examining the install.sh script it fails on this command: wget --quiet --show-progress --progress=bar:force:noscroll --no-check-certificate https://apt.jetson-ai-lab.dev/multiarch/nccl-local-repo-ubuntu2204-2.27.7-cuda13.0_1.0-1_amd64.debdeb

I did set INDEX_HOST=jetson-ai-lab.io in the .env file.

*** Please note that this reply is generated by LLM automatically ***

Unfortunately, the provided context does not contain a specific question about building an NCCL Docker container on Jetson Orin Nano. However, I can provide a general answer based on the context.

Building an NCCL (NVIDIA Collective Communication Library) Docker container on Jetson Orin Nano can be a complex process, and failures can occur due to various reasons. Here are some general steps and potential solutions to help you troubleshoot the issue:

  1. Verify the environment variables: Make sure that the INDEXHOST environment variable is set correctly to jetsonailab.io in the env file.
  2. Check the Docker version: Ensure that you are using a compatible Docker version on your Jetson Orin Nano. You can check the Docker version by running docker --version.
  3. Verify the NCCL version: Check the NCCL version you are trying to build. Ensure that it is compatible with your Jetson Orin Nano and the Docker version you are using.
  4. Check the build script: Examine the install.sh script to identify the specific command that is failing. Check the script’s syntax and ensure that all dependencies are installed.
  5. Dependency issues: Verify that all dependencies required by the NCCL container are installed on your Jetson Orin Nano. You can check the dependencies by running apt list --installed or dpkg -l.
  6. Docker build issues: If the issue persists, try building the NCCL container using the docker build command with the --no-cache option to force a rebuild.

Comparing with other issues in the context, there are several other problems related to Docker and containerization on Jetson devices, such as:

  • Error with “Nvidia Container Runtime with Docker Integration” on AGX Orin with JP6.2: This issue is related to a problem with the NVIDIA Container Runtime on Jetson AGX Orin with JetPack 6.2.
  • Which container - jetson container or nvidia container?: This question is related to choosing between Jetson and NVIDIA containers for a specific use case.
  • JetPack 6.0 docker integration broken exposing ports: This issue is related to a problem with Docker integration on Jetson AGX Xavier with JetPack 6.0.

In summary, building an NCCL Docker container on Jetson Orin Nano can be a complex process, and failures can occur due to various reasons. By verifying the environment variables, Docker version, NCCL version, build script, dependencies, and Docker build issues, you can troubleshoot and potentially resolve the problem.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

1 Like

Thank you for your response. The post is merely a confirmation of a build error. Question is how to solve this. Or better does anyone else have a solution at hand, which saves time. When no solution is available i may come up with one myself.

I created a bug report on the appropriate git repository. Indeed it is a complex problem. Since there is no documentation and the source is the documentation i have to make time to look into it. Have to think about this one, is it worth the bother.

I’m new to the Jetson platform. So i have to invest a reasonable amount of time for it.

Hi,

The package belongs to the multiarch section.
Since we only support JetPack 6 and 7 now, it doesn’t exist on the jetson-ai-lab.io server.

Instead, you can download the file directly from our website:

The corresponding change in the jetson-container source should be:

diff --git a/packages/cuda/nccl/Dockerfile b/packages/cuda/nccl/Dockerfile
index b4bf6953..d4604e7d 100644
--- a/packages/cuda/nccl/Dockerfile
+++ b/packages/cuda/nccl/Dockerfile
@@ -29,5 +29,6 @@ RUN apt-get update -y && \
     rm -rf /var/lib/apt/lists/*
 
 COPY install.sh /tmp/nccl/
+COPY nccl-local-repo-ubuntu2404-2.27.7-cuda13.0_1.0-1_arm64.deb /tmp/nccl/nccl-local-repo-ubuntu2404-2.27.7-cuda13.0_1.0-1_arm64.deb
 RUN /tmp/nccl/install.sh
 
diff --git a/packages/cuda/nccl/install.sh b/packages/cuda/nccl/install.sh
index 075b0e5c..fba89765 100755
--- a/packages/cuda/nccl/install.sh
+++ b/packages/cuda/nccl/install.sh
@@ -1,17 +1,11 @@
 #!/usr/bin/env bash
 set -ex
 echo "Installing NVIDIA NCCL $NCCL_VERSION (NCCL)"
-if [[ "$CUDA_ARCH" == "aarch64" ]]; then
-  DEB="nccl-local-repo-${DISTRO}-${NCCL_VERSION}-cuda13.0_1.0-1_arm64.deb"
-else
-  DEB="nccl-local-repo-${DISTRO}-${NCCL_VERSION}-cuda13.0_1.0-1_amd64.debdeb"
-fi
+
+DEB="nccl-local-repo-${DISTRO}-${NCCL_VERSION}-cuda13.0_1.0-1_arm64.deb"
 cd $TMP
-wget $WGET_FLAGS $MULTIARCH_URL/$DEB
-if [[ "$CUDA_ARCH" != "tegra-aarch64" ]]; then
-    dpkg -i $DEB
-    sudo cp /var/nccl-local-repo-ubuntu2404-2.27.7-cuda13.0/nccl-local-190A5319-keyring.gpg /usr/share/keyrings/
-    apt-get update
-    dpkg -i $DEB
-    apt-get -y install libnccl2 libnccl-dev
-fi
+
+dpkg -i $DEB
+sudo cp /var/nccl-local-repo-ubuntu2404-2.27.7-cuda13.0/nccl-local-190A5319-keyring.gpg /usr/share/keyrings/
+apt-get update
+apt-get -y install libnccl2 libnccl-dev

Thanks.

1 Like

Thank you for this quick solution!