[BUG] failed to start docker container in orin target with error: failed to create endpoint on network bridge, operation not supported

Required Info:

  • Software Version
    DRIVE OS 6.0.8.1
  • Target OS
    Linux
  • SDK Manager Version
    1.9.2.10884
  • Host Machine Version
    native Ubuntu Linux 20.04 Host installed with DRIVE OS DOCKER Containers

Describe the bug

With the doc, I tried to use the docker service in orin target in 6.0.8.1. Docker Services | NVIDIA Docs
Docker container run failed.

To Reproduce

cd /usr/local/cuda-11.4/samples/1_Utilities/deviceQuery/ && sudo make
./deviceQuery
sudo docker run --rm --runtime nvidia --gpus all -v $(pwd):$(pwd) -w $(pwd) ubuntu:20.04 ./deviceQuery

Expected behavior

This command might take a few moments as it will need to pull the Ubuntu 20.04
Docker image and start a Docker terminal session

Actual behavior

docker: Error response from daemon: failed to create endpoint frosty_satoshi on network bridge: failed to add the host (vethcbf8a3c) <=> sandbox (veth3fdb927) pair interfaces: operation not supported.

Additional context

# reboot takes no effect.

modinfo veth
modinfo: ERROR: Module veth not found.

sudo modprobe veth
modprobe: FATAL: Module veth not found in directory /lib/modules/5.15.98-rt-tegra

reference:

Dear @lizhensheng,
I could reproduce the issue with docker container. I am looking into and update you asap.

1 Like

Dear @lizhensheng,
To run docker services on the Target, developers must follow the steps to rebuild the kernel and filesystem per the instructions in Compiling the Kernel (Kernel 5.15) | NVIDIA Docs
with the following modifications:

  1. For step 4, include libncurses-dev and vim in the apt-get install command.
  2. For step 6, use defconfig.
  3. For step 7, after starting the menuconfig with make -C kernel O=${PWD}/out-linux menuconfig, press ENTER to select General setup. Navigate to Namespaces support and press the space bar to enable. Press ENTER to select Namespaces support. Navigate to User namespace and press the space bar to enable. Save the configuration and exit the menuconfig. Save the menuconfig with the make -C kernel O=${PWD}/out-linux savedefconfig command. When copying the updated defconfig, use cp ${PWD}/out-linux/defconfig kernel/arch/arm64/configs/tegra_defconfig.
  4. For step 11.d, replace instances of <kernel_version> in the code snippet with 5.15.98-rt-tegra before executing the commands.
  5. For step 12.c.i, replace the <fstype> string with standard before saving the new update_rfs.CONFIG.json file. Before running Build-FS per #12.c.i.2, execute the following snippet to remove all YAML entries for kernel modules for the updates/dkms path
python3 -B - << END
import yaml
with open('/drive/drive-linux/filesystem/copytarget/manifest/copytarget-kernel-modules.yaml', 'r') as f:
    data = yaml.safe_load(f)
data['fileList'] = [data_ for data_ in data['fileList'] if data_['destination'].find('updates/dkms') == -1]
with open('/drive/drive-linux/filesystem/copytarget/manifest/copytarget-kernel-modules.yaml', 'w') as f:
    yaml.dump(data, f)
END

Let me know if it works for you.

Thanks @SivaRamaKrishnaNV

Is these steps must be done by developers within drive agx orin?
Or is this a bug or a feature?

I will not do it in driveos 6.0.8.1, and expect a fix in the next versions. am I right?

Dear @lizhensheng ,
Is these steps must be done by developers within drive agx orin?
Or is this a bug or a feature?

This has to be done on host docker container and re-flash the target. The primary kernel released with 6.0.8.1 does not provide namespace support by default. Names for GPU-related devices have also been modified within the filesystem. When looking to run Docker containers will experience a “namespace creation reexec command failed” error. We have provided above WAR steps.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

We fixed this issue and docker should run target in next release

1 Like

Tested this on recent release and is working.

nvidia@tegra-ubuntu:/usr/local/cuda-11.4/samples/0_Simple/matrixMul$ sudo docker run --rm --privileged --network host --runtime nvidia --gpus all -v $(pwd):$(pwd) -w $(pwd) ubuntu:20.04 ./matrixMul
[sudo] password for nvidia:
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Ampere" with compute capability 8.7

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 618.30 GFlop/s, Time= 0.212 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.
2 Likes