Host Machine Version
native Ubuntu Linux 20.04 Host installed with DRIVE OS DOCKER Containers
Describe the bug
With the doc, I tried to use the docker service in orin target in 6.0.8.1. Docker Services | NVIDIA Docs
Docker container run failed.
To Reproduce
cd /usr/local/cuda-11.4/samples/1_Utilities/deviceQuery/ && sudo make
./deviceQuery
sudo docker run --rm --runtime nvidia --gpus all -v $(pwd):$(pwd) -w $(pwd) ubuntu:20.04 ./deviceQuery
Expected behavior
This command might take a few moments as it will need to pull the Ubuntu 20.04
Docker image and start a Docker terminal session
Actual behavior
docker: Error response from daemon: failed to create endpoint frosty_satoshi on network bridge: failed to add the host (vethcbf8a3c) <=> sandbox (veth3fdb927) pair interfaces: operation not supported.
Additional context
# reboot takes no effect.
modinfo veth
modinfo: ERROR: Module veth not found.
sudo modprobe veth
modprobe: FATAL: Module veth not found in directory /lib/modules/5.15.98-rt-tegra
For step 4, include libncurses-dev and vim in the apt-get install command.
For step 6, use defconfig.
For step 7, after starting the menuconfig with make -C kernel O=${PWD}/out-linux menuconfig, press ENTER to select General setup. Navigate to Namespaces support and press the space bar to enable. Press ENTER to select Namespaces support. Navigate to User namespace and press the space bar to enable. Save the configuration and exit the menuconfig. Save the menuconfig with the make -C kernel O=${PWD}/out-linux savedefconfig command. When copying the updated defconfig, use cp ${PWD}/out-linux/defconfig kernel/arch/arm64/configs/tegra_defconfig.
For step 11.d, replace instances of <kernel_version> in the code snippet with 5.15.98-rt-tegra before executing the commands.
For step 12.c.i, replace the <fstype> string with standard before saving the new update_rfs.CONFIG.json file. Before running Build-FS per #12.c.i.2, execute the following snippet to remove all YAML entries for kernel modules for the updates/dkms path
python3 -B - << END
import yaml
with open('/drive/drive-linux/filesystem/copytarget/manifest/copytarget-kernel-modules.yaml', 'r') as f:
data = yaml.safe_load(f)
data['fileList'] = [data_ for data_ in data['fileList'] if data_['destination'].find('updates/dkms') == -1]
with open('/drive/drive-linux/filesystem/copytarget/manifest/copytarget-kernel-modules.yaml', 'w') as f:
yaml.dump(data, f)
END
Dear @lizhensheng ,
Is these steps must be done by developers within drive agx orin?
Or is this a bug or a feature?
This has to be done on host docker container and re-flash the target. The primary kernel released with 6.0.8.1 does not provide namespace support by default. Names for GPU-related devices have also been modified within the filesystem. When looking to run Docker containers will experience a “namespace creation reexec command failed” error. We have provided above WAR steps.
nvidia@tegra-ubuntu:/usr/local/cuda-11.4/samples/0_Simple/matrixMul$ sudo docker run --rm --privileged --network host --runtime nvidia --gpus all -v $(pwd):$(pwd) -w $(pwd) ubuntu:20.04 ./matrixMul
[sudo] password for nvidia:
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Ampere" with compute capability 8.7
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 618.30 GFlop/s, Time= 0.212 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.