[BUG] failed to start docker container in orin target with error: failed to create endpoint on network bridge, operation not supported

lizhensheng · November 2, 2023, 7:25am

Required Info:

Software Version
DRIVE OS 6.0.8.1
Target OS
Linux
SDK Manager Version
1.9.2.10884
Host Machine Version
native Ubuntu Linux 20.04 Host installed with DRIVE OS DOCKER Containers

Describe the bug

With the doc, I tried to use the docker service in orin target in 6.0.8.1. Docker Services | NVIDIA Docs
Docker container run failed.

To Reproduce

cd /usr/local/cuda-11.4/samples/1_Utilities/deviceQuery/ && sudo make
./deviceQuery
sudo docker run --rm --runtime nvidia --gpus all -v $(pwd):$(pwd) -w $(pwd) ubuntu:20.04 ./deviceQuery

Expected behavior

This command might take a few moments as it will need to pull the Ubuntu 20.04
Docker image and start a Docker terminal session

Actual behavior

docker: Error response from daemon: failed to create endpoint frosty_satoshi on network bridge: failed to add the host (vethcbf8a3c) <=> sandbox (veth3fdb927) pair interfaces: operation not supported.

Additional context

# reboot takes no effect.

modinfo veth
modinfo: ERROR: Module veth not found.

sudo modprobe veth
modprobe: FATAL: Module veth not found in directory /lib/modules/5.15.98-rt-tegra

reference:

github.com/moby/moby

Error response from daemon: failed to create endpoint (...) on network bridge: failed to add the host (veth1d85371) <=> sandbox (vethbc264f6) pair interfaces: operation not supported.

opened 08:55PM - 16 Oct 16 UTC

closed 06:03AM - 17 Oct 16 UTC

lrkwz

area/networking

I have a fresly installed docker but when I cannot execute containers ``` docke…r run --rm hello-world docker: Error response from daemon: failed to create endpoint jolly_kare on network bridge: failed to add the host (veth1d85371) <=> sandbox (vethbc264f6) pair interfaces: operation not supported. ``` Docker info are: ``` Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 1 Server Version: 1.12.2 Storage Driver: devicemapper Pool Name: docker-8:1-799432-pool Pool Blocksize: 65.54 kB Base Device Size: 10.74 GB Backing Filesystem: xfs Data file: /dev/loop0 Metadata file: /dev/loop1 Data Space Used: 14.09 MB Data Space Total: 107.4 GB Data Space Available: 21.34 GB Metadata Space Used: 585.7 kB Metadata Space Total: 2.147 GB Metadata Space Available: 2.147 GB Thin Pool Minimum Free Space: 10.74 GB Udev Sync Supported: true Deferred Removal Enabled: false Deferred Deletion Enabled: false Deferred Deleted Device Count: 0 Data loop file: /var/lib/docker/devicemapper/devicemapper/data WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device. Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata Library Version: 1.02.77 (2012-10-15) Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: host bridge overlay null Swarm: inactive Runtimes: runc Default Runtime: runc Security Options: Kernel Version: 3.10.23-xxxx-std-ipv6-64-vps Operating System: Ubuntu 14.04.5 LTS OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 3.854 GiB Name: vps102867 ID: RTPT:KG2U:SH6J:5KS4:5S27:A57A:E6PV:425A:7KEF:UJKR:5WOG:FQBV Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ WARNING: No kernel memory limit support Insecure Registries: 127.0.0.0/8 ``` I've already tried to delete `/var/lib/docker/network/files/local-kv.db` as mentioned in similar issues with no success

github.com/moby/moby

docker: Error response from daemon: failed to create endpoint postgres on network bridge: failed to add the host (veth4ea851e) <=> sandbox (vethcd53b3f) pair interfaces: operation not supported.

opened 01:52PM - 20 Aug 17 UTC

closed 01:08PM - 28 Aug 17 UTC

esatterwhite

status/more-info-needed area/networking version/17.06

Containers - new or existing crash immediately docker: Error response from daemon: failed to create endpoint postgres on network bridge: failed to add the host (vethc285ba8) <=> sandbox (veth9156c12) pair interfaces: operation not supported.  **Steps to reproduce the issue:** ``` docker run -d --name=postgres -e POSTGRES_PASSWORD=abc123 -e POSTGRES_USER=esatterwhite -e POSTGRES_DB=allstar postgres:9.6 ``` The same error occurs for any image. It is also happening for previously created containers prior to updating from `17.0.5` **Output of `docker version`:** ``` Client: Version: 17.06.0-ce API version: 1.30 Go version: go1.8.3 Git commit: 3dfb8343 Built: Wed Jul 26 18:03:33 2017 OS/Arch: linux/amd64 Server: Version: 17.06.0-ce API version: 1.30 (minimum version 1.12) Go version: go1.8.3 Git commit: 02c1d87617 Built: Wed Jul 26 20:03:39 2017 OS/Arch: linux/amd64 Experimental: false ``` **Output of `docker info`:** ``` docker info Containers: 1 Running: 0 Paused: 0 Stopped: 1 Images: 1 Server Version: 17.06.0-ce Storage Driver: aufs Root Dir: /var/lib/docker/aufs Backing Filesystem: extfs Dirs: 15 Dirperm1 Supported: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4 init version: 949e6fa Security Options: seccomp Profile: default Kernel Version: 4.9.40-1-MANJARO Operating System: Manjaro Linux OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 31.43GiB Name: cyclops ID: ABFE:3ELS:HKMU:ZCY6:FKNK:NKGG:X6UA:6YBK:QK7I:2WFC:WFYB:4D37 Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false ``` **Additional environment details (AWS, VirtualBox, physical, etc.):** - virtual box 5.126 - Linux 4.9.40-1-MANJARO(ARCH) #1 SMP PREEMPT Fri Jul 28 09:24:52 UTC 2017 x86_64 GNU/Linux

SivaRamaKrishnaNV · November 2, 2023, 10:34am

Dear @lizhensheng,
I could reproduce the issue with docker container. I am looking into and update you asap.

SivaRamaKrishnaNV · November 3, 2023, 1:58am

Dear @lizhensheng,
To run docker services on the Target, developers must follow the steps to rebuild the kernel and filesystem per the instructions in Compiling the Kernel (Kernel 5.15) | NVIDIA Docs
with the following modifications:

For step 4, include libncurses-dev and vim in the apt-get install command.
For step 6, use defconfig.
For step 7, after starting the menuconfig with make -C kernel O=${PWD}/out-linux menuconfig, press ENTER to select General setup. Navigate to Namespaces support and press the space bar to enable. Press ENTER to select Namespaces support. Navigate to User namespace and press the space bar to enable. Save the configuration and exit the menuconfig. Save the menuconfig with the make -C kernel O=${PWD}/out-linux savedefconfig command. When copying the updated defconfig, use cp ${PWD}/out-linux/defconfig kernel/arch/arm64/configs/tegra_defconfig.
For step 11.d, replace instances of <kernel_version> in the code snippet with 5.15.98-rt-tegra before executing the commands.
For step 12.c.i, replace the <fstype> string with standard before saving the new update_rfs.CONFIG.json file. Before running Build-FS per #12.c.i.2, execute the following snippet to remove all YAML entries for kernel modules for the updates/dkms path

python3 -B - << END
import yaml
with open('/drive/drive-linux/filesystem/copytarget/manifest/copytarget-kernel-modules.yaml', 'r') as f:
    data = yaml.safe_load(f)
data['fileList'] = [data_ for data_ in data['fileList'] if data_['destination'].find('updates/dkms') == -1]
with open('/drive/drive-linux/filesystem/copytarget/manifest/copytarget-kernel-modules.yaml', 'w') as f:
    yaml.dump(data, f)
END

Let me know if it works for you.

lizhensheng · November 3, 2023, 2:08am

Thanks @SivaRamaKrishnaNV

Is these steps must be done by developers within drive agx orin?
Or is this a bug or a feature?

I will not do it in driveos 6.0.8.1, and expect a fix in the next versions. am I right?

SivaRamaKrishnaNV · November 3, 2023, 2:21am

Dear @lizhensheng ,
Is these steps must be done by developers within drive agx orin?
Or is this a bug or a feature?

This has to be done on host docker container and re-flash the target. The primary kernel released with 6.0.8.1 does not provide namespace support by default. Names for GPU-related devices have also been modified within the filesystem. When looking to run Docker containers will experience a “namespace creation reexec command failed” error. We have provided above WAR steps.

system · November 17, 2023, 3:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

SivaRamaKrishnaNV · March 26, 2024, 8:19am

We fixed this issue and docker should run target in next release

SivaRamaKrishnaNV · September 5, 2024, 10:26am

Tested this on recent release and is working.

nvidia@tegra-ubuntu:/usr/local/cuda-11.4/samples/0_Simple/matrixMul$ sudo docker run --rm --privileged --network host --runtime nvidia --gpus all -v $(pwd):$(pwd) -w $(pwd) ubuntu:20.04 ./matrixMul
[sudo] password for nvidia:
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Ampere" with compute capability 8.7

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 618.30 GFlop/s, Time= 0.212 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.

Topic		Replies	Views
Docker Support DriveOS 6.0.8.1 DRIVE AGX Orin General docker	27	2013	May 3, 2024
Error while running Docker Services example on Drive Orin DRIVE AGX Orin General docker	11	1193	October 17, 2023
Docker on top of AGX orin hardware with 6.0.6 DRIVE AGX Orin General docker	21	1334	June 28, 2023
No changes in menuconfig while rebuilding kernel for docker enabling DRIVE AGX Orin General docker , driveos	7	104	April 21, 2025
[BUG] Run Docker in 6.8.0.1 failed DRIVE AGX Orin General docker	3	437	February 13, 2024
Docker on the TX2 Jetson TX2	37	32176	October 18, 2021
Docker gives error after upgrading ubuntu Jetson AGX Orin docker	12	3222	March 15, 2024
Bridge network errors using docker-ce in swarm mode Jetson TX1	10	2405	October 18, 2021
Docker on TX1 (or TX2) Jetson TX1	25	11943	October 18, 2021
Building Docker Container on DRIVE ORIN DRIVE AGX Orin General docker , driveos	3	84	July 11, 2025

[BUG] failed to start docker container in orin target with error: failed to create endpoint on network bridge, operation not supported

Related topics