No changes in menuconfig while rebuilding kernel for docker enabling

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.10.0
DRIVE OS 6.0.8.1
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
2.1.0
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Issue Description
I want to run some docker containers directly on the Drive AGX Orin.
I followed the NVidia technical blog here. The example failed where it showed the below docker daemon error on network bridge error.
Following that, it led me to a previous post on this forum [BUG] failed to start docker container
In one of the comments (Nov 3 '23), it describes steps to modify in the kernel rebuild procedure in order to enable Namespace support and User namespace.
I followed the steps in recompiling the kernel and discovered in step 7, that namespace support and user namespace support was already enabled.

If namespace support and user namespace support is already enabled on my Drive AGX Orin, why is there still an error? Do I continue with the kernel rebuild and flash the target?

Error String

docker: Error response from daemon: failed to create endpoint on network bridge

Logs
Provide logs in text box instead of image

Please paste the complete application log here. If there are multiple logs, please use multiple text box

Dear @adityen.sudhakaran,
I notice matrixMul sample is working out of the box in DRIVE OS 6.0.10.
Do you see any issue with below command?

nvidia@tegra-ubuntu:/usr/local/cuda-11.4/samples/0_Simple/matrixMul$ sudo docker run --rm --privileged --network host --runtime nvidia --gpus all -v $(pwd):$(pwd) -w $(pwd) ubuntu:20.04 ./matrixMul
Unable to find image 'ubuntu:20.04' locally
20.04: Pulling from library/ubuntu
1b9f3c55f9d4: Pull complete
Digest: sha256:8e5c4f0285ecbb4ead070431d29b576a530d3166df73ec44affc1cd27555141b
Status: Downloaded newer image for ubuntu:20.04
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Ampere" with compute capability 8.7

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 618.02 GFlop/s, Time= 0.212 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.

Hi @SivaRamaKrishnaNV,

Yes, I see the docker daemon error:

docker: Error response from daemon: failed to create endpoint on network bridge

Dear @adityen.sudhakaran,
Could you share the complete command and log. Are you using DRIVE OS 6.0.10?

Hi @SivaRamaKrishnaNV,

The command I was running was:

nvidia@tegra-ubuntu:/usr/local/cuda-11.4/samples/0_Simple/matrixMul$ sudo make

After a successful make, I was running the below command, according to this blog post:

nvidia@tegra-ubuntu:/usr/local/cuda-11.4/samples/0_Simple/matrixMul$ sudo docker run --rm --runtime nvidia --gpus all -v $(pwd):$(pwd) -w $(pwd) ubuntu:20.04 ./matrixMul

The error I was getting was:

docker: Error response from daemon: failed to create endpoint on network bridge: failed to add the host pair interfaces: operation not supported.

I just recompiled the kernel and noticed that the menuconfig already had namespace support and user namespace support enabled. I was curious as to how they could already be enabled.

I recompiled the kernel and reflashed the Orin. Now I get the following:

nvidia@tegra-ubuntu:/usr/local/cuda-11.4/samples/0_Simple/matrixMul$ uname -a
Linux tegra-ubuntu 5.15.122-rt-tegra #1 SMP PREEMPT_RT Tue Apr 23 18:51:49 PDT 2024 aarch64 aarch64 aarch64 GNU/Linux
nvidia@tegra-ubuntu:/usr/local/cuda-11.4/samples/0_Simple/matrixMul$ cat /etc/nvidia/version-ubuntu-rootfs.txt 
6.0.10.0-36101120
nvidia@tegra-ubuntu:/usr/local/cuda-12.4/samples/0_Simple/matrixMul$ sudo make
>>> GCC Version is greater or equal to 4.7.0 <<<
/usr/local/cuda-11.4/bin/nvcc -ccbin g++ -I../../common/inc  -m64    --threads 0 --std=c++11 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_87,code=compute_87 -o matrixMul.o -c matrixMul.cu
/usr/local/cuda-11.4/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_53,code=sm_53 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_87,code=compute_87 -o matrixMul matrixMul.o 
mkdir -p ../../bin/aarch64/linux/release
cp matrixMul ../../bin/aarch64/linux/release
nvidia@tegra-ubuntu:/usr/local/cuda-11.4/samples/0_Simple/matrixMul$ sudo docker run --rm --privileged --network host --runtime nvidia --gpus all -v $(pwd):$(pwd) -w $(pwd) ubuntu:20.04 ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Ampere" with compute capability 8.7

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 618.06 GFlop/s, Time= 0.212 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.

I guess it works now? But I’m not sure if it is due to the recompiled kernel or because as you state that docker works out of the box on DriveOS 6.0.10.

Your previous command does not have these flags.

Hi @SivaRamaKrishnaNV,

Yes, I think that did the difference, and rebuilding the kernel was not required. Nice to see that it works.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.